EDIorg / ecocomDP

A dataset design pattern and R package for ecological community data.
https://ediorg.github.io/ecocomDP/
Other
32 stars 13 forks source link

find package ids for popler imports #34

Closed mobb closed 3 years ago

mobb commented 6 years ago

The popler folks did not get all their datasets from the repo. Many (most) were downloaded from sites' individual websites, and did not include the repository package id. These are labeled "NA" in the popler_knbid.csv file.

However, for every dataset I've looked for (approx 10, manually), a packageId exists. These are already in the list called L0_metacommunities.

possible solutions to filling in the "NAs": A. continue manually (eew). B. scrape URL and look for more info, eg, a DOI or packageId that was missed C. query titles in pasta

Will start with option C - many sites now use the same title, even if they are not displaying a pasta packageID.

Popler (Aldo) is aware of this shortcoming in their process, and may come up with a way to gather DOIs instead of the url they currently use to link out to metadata (as URLs are already breaking).

mobb commented 6 years ago

Here is file: https://github.com/EDIorg/ecocomDP/blob/master/documentation/processing_queue/popler_knbid.csv

clnsmth commented 6 years ago

@mobb, the link above does not resolve.

clnsmth commented 5 years ago

Close this issue @mobb?

clnsmth commented 3 years ago

This falls within the scope of popler, not ecocomDP.