EDIorg / ecocomDP

A dataset design pattern and R package for ecological community data.
https://ediorg.github.io/ecocomDP/
Other
32 stars 13 forks source link

merge datasets in L0_metacommunities.txt, popler-known-ids into processing queue, prioritize #33

Closed mobb closed 6 years ago

mobb commented 6 years ago

priorities according to WG status:

  1. "in prep" (so they can use asap)
  2. "complete" (they have already made their L3, but if there is an update to L0, they may want it)
  3. "lets talk" (they have not yet decided on if/how to use this dataset - will become either inprep or rejected)
  4. "rejected" (not used for this project, but maybe for some other)
mobb commented 6 years ago

process: build a simple list from the popler csv; include only the datasets with known ids (see issue #33 for handling the unknowns, which have ids NA or 'not available'):

grep knbid popler_knbid.json  | grep -v NA | grep -v 'not avail' | cut -d":" -f2 | sed 's/^ "//'  | sed 's/",$//'  | sort >   popler_known_ids.txt

construct csv rows, add fixed fields:

 sed  '1,103s/^/edi,/' popler_known_ids.txt |  sed  '1,103s/$/,,,,from popler/' > popler_new_rows.csv

do the same for the metacommunities list.

 cat ../incoming/L0_metacommunities.txt |    sed  '1,50s/^/edi,/' |  sed  '1,50s/$/,,,,,metacommunities requested/' > metacommunities_new_rows.csv

put them together, have a look. some from these new lists will already be in the queue, perhaps underway or even completed.

cat metacommunities_new_rows.csv popler_new_rows.csv ../data_processing_queue.csv | sort > queue_to_be_examined.csv
mobb commented 6 years ago

evaluated duplicates manually, merged notes. queue is now 145 entries.

mobb commented 6 years ago

LIst contains the most recent revision (as was known at time of edit). The datasets marked "from popler" were recommended by popler, but popler usually has an older, usable version (as does metacoummunites). So even though these are currently being use, they might not be considered 'high priority' in our list.