Open millerjeremya opened 3 years ago
thanks @millerjeremya, not really addressing your point but: you can access the metadata of a dataset by using the datasetKey. The datasetKey is the UUID in the dataset URLs (and can also be used in the registry API)
For example, the metadata for "50c9509d-22c7-4a22-a47d-8c48425ef4a7" is available
The same goes for the publishingOrgKey except the URL is a bit different: https://www.gbif.org/publisher/28eb1a3f-1c15-4a95-931a-4af90ecb574d
Data source is very important when analyzing GBIF data. But determining the source from data downloaded from GBIF is currently more difficult than it needs to be, in my opinion. If one downloads a selection of data as a CSV file, there is a datasetKey field and a publishingOrgKey field, but no obvious way to look up what actual institutions or databases they represent. If one instead chooses to download the data as DarwinCore, the situation is a little better but still quite labor intensive. There is a series of XML text files corresponding to what appears to be these key field codes. These can be opened in and inspected individually in a text editor, and from this one can discover the institution or database that provided the data. It may be that there is a smarter way to view DarwinCore data, but I am not aware of it. For my part, I am interested in classifying the GBIF data that I download into a few categories based on their source. In my experience, the major categories of data in GBIF are natural history collections databases, observation networks, DNA sequence databases, and data extracted from taxonomic literature (i.e., Plazi). I would appreciate efforts to make this quicker and easier to accomplish.