Closed mobb closed 6 years ago
process: build a simple list from the popler csv; include only the datasets with known ids (see issue #33 for handling the unknowns, which have ids NA or 'not available'):
grep knbid popler_knbid.json | grep -v NA | grep -v 'not avail' | cut -d":" -f2 | sed 's/^ "//' | sed 's/",$//' | sort > popler_known_ids.txt
construct csv rows, add fixed fields:
sed '1,103s/^/edi,/' popler_known_ids.txt | sed '1,103s/$/,,,,from popler/' > popler_new_rows.csv
do the same for the metacommunities list.
cat ../incoming/L0_metacommunities.txt | sed '1,50s/^/edi,/' | sed '1,50s/$/,,,,,metacommunities requested/' > metacommunities_new_rows.csv
put them together, have a look. some from these new lists will already be in the queue, perhaps underway or even completed.
cat metacommunities_new_rows.csv popler_new_rows.csv ../data_processing_queue.csv | sort > queue_to_be_examined.csv
evaluated duplicates manually, merged notes. queue is now 145 entries.
LIst contains the most recent revision (as was known at time of edit). The datasets marked "from popler" were recommended by popler, but popler usually has an older, usable version (as does metacoummunites). So even though these are currently being use, they might not be considered 'high priority' in our list.
priorities according to WG status: