bioimage-io / bioimage.io

Website for the BioImage Model zoo -- a model zoo for bioimage analysis.
https://bioimage.io
MIT License
54 stars 19 forks source link

About how zenodo counts download numbers #353

Open oeway opened 1 year ago

oeway commented 1 year ago

The bioimage model zoo uses zenodo as our storage and it's natural to use the download statistics. However, we don't know how exactly zenodo count the downloads. In the Zenodo documentation about user statistics, it vaguely documented what are the differences of download and unique download

What is a download? A user (human or machine) downloading a file from a record, excluding double-clicks and robots. If a record has multiple files and you download all files, each file counts as one download.

What is a unique download? A unique download is defined as one or more file downloads from files of a single record by a user within a 1-hour time-window. This means that if one or more files of the same record were downloaded multiple times by the same user within the same time-window, we consider it as one unique download.

In this description, they seem to differentiate 3 categories: human, machine and robots, and they group human and machine so they call it user. Therefore, the download statistic will basically include human and machine, but not the robots.

With @FynnBe , we diged it a bit deeper, and found out what zenodo actually do under the hood (open source rocks here!).

Here is what we found:

This means, any download from our bioimageio.core library or the bioimage.io downloader for example are all counted in the download statistics. This also means our CI download will also be counted!

However, by setting the user-agent as robot (e.g. User-Agent=bot) in the bioimageio.core or bioimageio.spec library, we can easily label our CI script as robot such that it can be excluded in the download statistics. Here is an example shows how you can set user agent in Python.

cc @fjug @akreshuk @FynnBe @constantinpape

FynnBe commented 1 year ago

We can automatically label the bioimageio.core as robot, by detecting the CI environment. For that, there seems to be a common "CI" env var: https://stackoverflow.com/a/75223617