ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
161 stars 206 forks source link

Add capabilities to collect statistics (with an opt-in/out) #4541

Closed billsacks closed 6 months ago

billsacks commented 9 months ago

I have been talking with @briandobbins @jedwards4b and others this week about building out our capabilities to collect statistics on CESM usage. I am opening this issue to record some initial ideas and start a discussion on how we might want to implement this.

The basic idea is that we want a feature – that users could opt into or out of – that sends some information back to us to let us collect data on how many people are using CESM and – at least eventually – details of the kinds of configurations they are running. This could be done in create_newcase or case.submit; the latter could be better in terms of allowing us to collect information on what's actually being run. The initial implementation could collect minimal information (even a simple ping where we can record the IP address would be useful information, so we can get an overall sense of usage); we could then build this out over time.

For privacy reasons we would want an opt-out option (and this may be required legally, e.g., in the EU). The way I envision this working is as follows, but others may have different / better ideas:

Depending on what statistics we want to capture, it might also be helpful for (hidden) files to be created in SRCROOT and/or CASEROOT the first time this is run: By putting a file in SRCROOT, we can see if the given clone has already been counted: if this file is not yet present, then we haven't counted this clone yet. Similarly, by putting a file in CASEROOT, we can see if the given case has already been counted. (This would facilitate counting the number of times CESM/E3SM is cloned, and the number of unique cases created.)

I'll note that, for simply collecting data on the number of clones of CESM, we considered implementing something in manage_externals (e.g., hitting an NCAR server whenever manage_externals is run so that we can collect the kind of data that used to be collected when we hosted CESM via svn), but since the long-term goal is to collect additional statistics, we thought we might as well put this functionality in the right long-term place, which is CIME.

Thoughts?

gold2718 commented 9 months ago

For privacy reasons we would want an opt-out option (and this may be required legally, e.g., in the EU). The way I envision this working is as follows, but others may have different / better ideas:

You should get some legal advice but from what I can tell, the GDPR requires an "opt-in" approach before data that can be tied to an individyal can be collected. There are "legitimate interest" exceptions but even then, you must provide a mechanism for a challenge. Then, if the data is to be transferred to the US, it has to meet the requirements of the DPF. IANAL but you should consult one.

briandobbins commented 9 months ago

We don't plan on collecting 'personally identifying' info, and the data gets anonymized, which seemed to be fine to UCAR legal at a first glance (so long as the CESM webpage has a notice about it). But this is a good reminder to go back to them with a more detailed write-up and ensure we're fine.

Cheers,

On Sun, Dec 10, 2023 at 12:23 PM goldy @.***> wrote:

For privacy reasons we would want an opt-out option (and this may be required legally, e.g., in the EU). The way I envision this working is as follows, but others may have different / better ideas:

You should get some legal advice but from what I can tell, the GDPR https://en.wikipedia.org/wiki/General_Data_Protection_Regulation requires an "opt-in" approach before data that can be tied to an individyal can be collected. There are "legitimate interest" exceptions but even then, you must provide a mechanism for a challenge. Then, if the data is to be transferred to the US, it has to meet the requirements of the DPF https://en.wikipedia.org/wiki/EU%E2%80%93US_Data_Privacy_Framework. IANAL but you should consult one.

— Reply to this email directly, view it on GitHub https://github.com/ESMCI/cime/issues/4541#issuecomment-1849057525, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL2HPIBCQ57O4OLKAM56RDYIYDZNAVCNFSM6AAAAABANMK2RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZGA2TONJSGU . You are receiving this because you were mentioned.Message ID: @.***>

billsacks commented 9 months ago

Thanks @gold2718 . Yeah, we will definitely want to get a sign-off on the data collection approach. I have renamed the issue to "opt-in/out"... the way I have envisioned this (and tried to describe it) actually feels more like an opt-in approach already: not collecting any data unless the user gives permission when this message appears.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 5 days with no activity.