gbif / tech-docs

This is an integrated technical documentation site for GBIF.org
https://techdocs.gbif.org
0 stars 2 forks source link

Document download activity report and citations #37

Open ManonGros opened 1 year ago

ManonGros commented 1 year ago

This is not the first time we get this type of question on Helpdesk. Here are the latest questions we had:

Activity downloads

Citation/Literature downloads

MattBlissett commented 1 year ago

Activity: that is this method (JSON) and the one following (TSV): https://tech-docs.gbif-dev.org/en/openapi/v1/occurrence#/Occurrence%20download%20statistics/getDownloadedStatistics

The API documentation could be improved here, but I think we need some information somewhere else too — many users looking at this report won't be API users.

Literature: discovered/published/added in the response schema of https://tech-docs.gbif-dev.org/en/openapi/v1/literature#/Literature/getLiteratureById

The values for CitationType aren't described, I'd need to check with Daniel.

ManonGros commented 1 year ago

FYI, after checking some details with Daniel, here was my answer:

  • The activity download report corresponds to the occurrences included in a download. It doesn’t take into account searches and views of a record.
  • The total_records correspond to the records from a given dataset, that have been downloaded that month. Some downloads may include only one or two records from a dataset. Several download can include the same record (in which case, the record will be counted twice in the total_records).
  • The number_downloads is the number of downloads containing at least one record from the dataset concerned.

See this typical example, where a user looked for a species (Cenchrus setaceus (Forssk.) Morrone) and generated a download for all the corresponding occurrences. In this case, the download contained only one record from the University of Alberta Vascular Plant Herbarium (ALTA-VP) dataset. This download will add 1 to the number_downloads and 1 for the total_records of August 2023. Does it make sense?

For citations, a lot of the work we do is still manual, some of the process is described here.

  • The published date is when the piece of literature citing the data (article, book chapter, etc.) was published. The discovered date is when we found this piece of literature. The added date is when it was added to our literature index.
  • About the citation type field:
    • The value “DOI” in the citation type field means that the piece of literature cited one or several GBIF dataset, derived dataset and/or download DOI(s).
    • The “generic” value means that the piece of literature didn’t specifically cite a DOI but only refer to GBIF (for example: “[…] used the Global Biodiversity Information Facility (GBIF)”). Daniel likely tracked the specific download after the publication was found.
    • The “generic_DOIs_provided” means that the paper used a generic citation, but the authors later provided a DOI and/or updated the paper with a DOI.
  • The country coverage is added manually by looking at the scope of the publication and data downloaded.
  • The topic tags are added manually by Daniel.