Data publishers of the Global Biodiversity Information Facility (GBIF) apply a wide range of licenses to their datasets. This is problematic:
We want to get an overview of the characteristics of the licenses used in all GBIF registered datasets.
make data/generated/datasets.csv
make data/licenses.csv
make data/generated/datasets-annotated.csv
make analysis
gh-pages
branch.† You can easily transform the UUID keys to working URLs as follows:
"http://www.gbif.org/dataset/" & key
: http://www.gbif.org/dataset/66f6192f-6cc0-45fd-a2d1-e76f5ae3eab2"http://www.gbif.org/publisher/" & owningOrganizationKey
: http://www.gbif.org/publisher/1989b627-2a61-44db-83e4-392efc5da0a9These are the requirements for running the analysis:
These are the libraries used for the charts:
This work (especially the manual interpretation of the licenses) is subject to error. We hope to mitigate this by opening up our workflow in this repository (such as our guidelines), but we disclaim any liability for all uses of this work. As new and updated datasets are published to GBIF all the time, our list of datasets (gets replaced with each analysis) and licenses (new licenses are added with each analysis) will be outdated. Verify the last commit timestamp for these files to see how recent they are.
Want to use this work in a scholarly publication? You can cite this repository as:
Desmet P, Aelterman B (2013) Interpreting licenses of GBIF registered data. https://github.com/Datafable/gbif-data-licenses (accessed yyyy-mm-dd)