Data source in GBIF downloads

Data source is very important when analyzing GBIF data. But determining the source from data downloaded from GBIF is currently more difficult than it needs to be, in my opinion. If one downloads a selection of data as a CSV file, there is a datasetKey field and a publishingOrgKey field, but no obvious way to look up what actual institutions or databases they represent. If one instead chooses to download the data as DarwinCore, the situation is a little better but still quite labor intensive. There is a series of XML text files corresponding to what appears to be these key field codes. These can be opened in and inspected individually in a text editor, and from this one can discover the institution or database that provided the data. It may be that there is a smarter way to view DarwinCore data, but I am not aware of it. For my part, I am interested in classifying the GBIF data that I download into a few categories based on their source. In my experience, the major categories of data in GBIF are natural history collections databases, observation networks, DNA sequence databases, and data extracted from taxonomic literature (i.e., Plazi). I would appreciate efforts to make this quicker and easier to accomplish.

gbif / portal-feedback

Data source in GBIF downloads #3381