ACTRIS-Data-Centre / actris-vocabulary

Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

vocabulary for data use statistics metrics #5

Closed markusfiebig closed 6 months ago

markusfiebig commented 1 year ago

ACTRIS will use a set of metrics to quantify statistical properties of data use. These should be included and defined in the vocabulary.

markusfiebig commented 10 months ago

Top-level category: data production and use metrics

Metric preferred label Definition
variable year A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP.
variable years download rate A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
variable years download rate, human interface A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
variable years download rate, machine interface A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
variable years download rate, by country A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
variable years download rate, human interface, by country A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
variable years download rate, machine interface, by country A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
variable years download rate, by IP type A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
variable years download rate, human interface, by IP type A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
variable years download rate, machine interface, by IP type A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
experiment dataset Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions.
experiment dataset download rate Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset download rate, human interface Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
experiment dataset download rate, machine interface Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
experiment dataset download rate, by country Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset download rate, human interface, by country Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
experiment dataset download rate, machine interface, by country Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
experiment dataset download rate, by IP type Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset download rate, human interface, by IP type Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans.
experiment dataset download rate, machine interface, by IP type Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By IP type means that the rate is resolved by the institution type of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for machines.
landing page resolution rate Landing pages of data products are accessed through their pertaining persistent identifiers. Rate is the number of such resolutions per time interval.
data product visualization rate Number of graphical visualizations of a data product per time interval.
user visit number rate Number of user visits per time, regardless of origin.
user visit number rate, human interface Number of user visits per time, regardless of origin. Includes access through interfaces for humans.
user visit number rate, machine interface Number of user visits per time, regardless of origin. Includes access through interfaces for machines.
user visit number rate, machine interface, ACTRIS portal Number of user visits per time, originating from ACTRIS data portal through machine interface of data centre unit.
data search number rate Number of data searches per time, regardless of origin.
data search number rate, human interface Number of data searches per time, regardless of origin. Includes access through interfaces for humans.
data search number rate, machine interface Number of data searches per time, regardless of origin. Includes access through interfaces for machines.
data search number rate, machine interface, ACTRIS portal Number of data searches per time, originating from ACTRIS data portal through machine interface of data centre unit.
siiptuo commented 9 months ago

Concerning "IP type" statistics:

  1. Possible values of "institution type of the requesting IP address" must be defined.
  2. This is closely related to how the "institution type of the requesting IP address" is actually determined.
siiptuo commented 8 months ago

Currently CLU can provide only "variable years download rate" and "variable years download rate, by country" without distinguishing "human" and "machine" interface. Also, we're not tracking information required for other metrics. Our suggestion is to start with the lowest common denominator and not try to define every possible metric in advance.

lisemurberg commented 8 months ago

KPIs defined in the ACTRIS annual workplan (https://intranet.actris.eu/index.php/f/91936, underneath each activity) requries:

  1. Download of datasets (i.e. variable years download rate and experiment dataset download rate)
  2. Number of different users, where different users are identified by different IPs.

(There are more KPIs but they are by choice dependent on planned activities, submitted data and uptime of systems, not necessarily the amount of downloaded data/visitors)

Comments: Nr 1 Does not differentiate between human and machine so far, but I think its good to have in the vocabulary so that we can expand on the statistics later on.

Nr 2 For me this could be determined by "user visit number rate", but its unclear to me in the KPI if users refer to users that download data or users that visit the portal.

So in practice (i.e. for the statistics APIs) I agree that we should start with only the lowest common denominator that also works with the KPIs. In the vocabulary I'm fine with defining elements that we wish to work towards as well (i.e. if we want to be able to differentiate between 'human' and 'machine' interface later on).

lisemurberg commented 7 months ago

Comments and discussion from the Expert Team:

  1. Possible values of "institution type of the requesting IP address" must be defined. Said by Tuomas above, but also the rest of the ET team emphasis this.
  2. The "landing page resolution rate" is a bit unclear. What is meant by 'resolutions' (number of visits/access through doi link?) and by 'rate' is this a request for average over a day, week, month?
  3. For both 'user visit number rate' and 'data search number rate':
    • We think the sub categories are a bit redundant and unclear. From our understanding this is related to the visits and searches done at the unit portals or ACTRIS (DVAS) portal.
    • What would be a 'user visit' by machine interface? Is machine interface interesting for search and visits?
    • Is it necessary to include subcategory for 'machine interface, ACTRIS portal'?
    • Should define that 'user visits' mean number of unique IP addresses/users.
  4. Should we have statistics for country related to the facilities? Now country is only related to the user IP address. Ex. variable years download rate, by Italy (user country), by English ('facility'/'station'/'experiment' country) facilities.
pgumaclaramunt commented 7 months ago

A couple of comments from ARES:

  1. Regarding the IP type: up until May 2023, data downloads could be done only after logging in, so we had the information on IP type (i.e. Institutions of the users / external users). Since June 2023, they can also be done via API without login, so we will have mixed information and will need to conform the old information we have according to the organization types that will be defined for the IPs (related to Tuomas comment above).
  2. We are often asked to provide statistics that require the information from Lise's point 4 (i.e. downloads of data produced by French facilities from any country).
markusfiebig commented 7 months ago

Concerning "IP type" statistics:

1. Possible values of "institution type of the requesting IP address" must be defined.

2. This is closely related to how the "institution type of the requesting IP address" is actually determined.

The ACTRIS MB has decided that we don't need to track IP type. The corresponding concept versions will be taken out.

markusfiebig commented 7 months ago

Currently CLU can provide only "variable years download rate" and "variable years download rate, by country" without distinguishing "human" and "machine" interface. Also, we're not tracking information required for other metrics. Our suggestion is to start with the lowest common denominator and not try to define every possible metric in advance.

The intention is to define the data use metrics we ultimately would like to have, also to be clear which implementations are needed to provide these metrics. Some units might already have this in place. This doesn't mean that everything needs to be implemented at once.

markusfiebig commented 7 months ago

Next version of data use metrics table.

Intended use: the data use metrics referring to datasets (as opposed to those for websites) can the requested for a user determined selection ("search") of datasets. Example: downloads of data produced by French facilities from Italy. Can be solved by requesting data use metrics for French sites (user selection), go to "by country" metrics, and extract numbers for Italy.

Top-level category: data production and use metrics

Metric preferred label Definition
variable year A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP.
variable years download rate A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
variable years download rate, by country A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions.
experiment dataset download rate Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset download rate, by country Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Examples of experiments are runs of atmospheric simulation chambers or airborne vehicle flight missions. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
landing page visit rate Landing pages of data products are accessed through their pertaining persistent identifiers. Rate is the number of such visits per time interval.
data product visualization rate Number of graphical visualizations of a data product per time interval.
user visit number rate Number of user visits to web-interface per time, individual IP addresses, regardless of origin.
user visit number rate, by country Number of user visits to web-interface per time, individual IP addresses. By country means that the rate is resolved by the country of the requesting IP address.
siiptuo commented 7 months ago

It looks to me that these metrics are unable to provide ACTRIS DC KPI: Number of different users downloading ACTRIS data (Different users are identified by different IPs / Mean number of different users in the previous 5 years). The statistics API should already provide the metrics: Yearly unique IPs and Monthly unique IPs.

As already mentioned, user visit should be defined clearly. For example, if an user visits 5 pages in a row, is it 1 visit or 5 visits?

markusfiebig commented 7 months ago

Next version, taking into account Tuomas' and Benedicte's comments:

Intended use: the data use metrics referring to datasets (as opposed to those for websites) can be requested for a user determined selection ("search") of datasets. Example: downloads of data produced by French facilities from Italy. Can be solved by requesting data use metrics for French sites (user selection), go to "by country" metrics, and extract numbers for Italy.

Top-level category: data production and use metrics

Metric preferred label Definition
variable year A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP.
variable years download rate A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
variable years download rate, by country A variable year is defined as one year's worth of data for one variable. The time resolution of the data product is defined by the ACTRIS data management plan (DMP). The instrument has to be operational for at least 75% of the nominal operation time as defined in the DMP. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Example of experiments are runs of atmospheric simulation chambers.
experiment dataset download rate Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Example of experiments are runs of atmospheric simulation chambers. Rate is the number of events per time. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
experiment dataset download rate, by country Worth of data produced by one experiment, including data provided by all instruments involved in the experiment. Example of experiments are runs of atmospheric simulation chambers. Rate is the number of events per time. By country means that the rate is resolved by the country of the requesting IP address. Download includes file download and streaming, counted including fractions of whole variable years. Includes download through interfaces for humans and machines.
data download user rate, by IP User rate is the number of users, as identified by different IP addresses, per time interval. The data download user rate is the user rate for any data download service.
data product visualization rate Number of graphical visualizations of a data product per time interval.
visit A visitor enters a website or application for the first time, visits a page, or takes any tracked action more than 30 minutes after the last action / visit, it is counted as a new visit.
visit number rate Number of user visits to a website per time, individual IP addresses, regardless of origin.
visit number rate, by country Number of user visits a website per time, individual IP addresses. By country means that the rate is resolved by the country of the requesting IP address.
markusfiebig commented 6 months ago

Accepted by ET, will be included in next version.

markusfiebig commented 6 months ago

Included in next vocabulary version.

pgumaclaramunt commented 5 months ago

Hey, we have a question about "data download user rate, by IP". What does "The data download user rate is the user rate for any data download service." mean, exactly? This is what we have: https://data.earlinet.org/api/services/restapi/download/stats?dimensions=yearMonth,uniqueIps Is this it? Or are we missing something?

markusfiebig commented 5 months ago

Hi,

the agreed definition would be here:

https://vocabulary.actris.nilu.no/actris_vocab/datadownloaduserrate-byIP

From what I can see in your link, this would match the definition, provided it includes IP occured at any data download service you offer. However, you should state a unit for your numbers. I can only guess that it is IPs by month?