This provides metadata about publications for the Rich Context knowledge graph, which links publications to datasets.
The links and other metadata that are represented here originate from manually-curated documents provided by our community of researchers, agencies, and other data providers.
Updates arrive in mulitple drops and the manual curation gets performed over in that repo prior to commits: https://github.com/NYU-CI/RichContextMetadata/tree/master/metadata
Also, before working in this repo you must set up your pre-commit hooks for Git:
chmod +x .githooks/pre-commit
bash .githooks/one-time-hook-setup.sh
Create a new branch with the same name as your metadata/
subdirectory.
Example:
git checkout -b 20190717_usda_wic
Identify the CSV within your metadata/
subdirectory. There may be
multiple sheets in the original spreadsheet provided by the partner,
so make sure you've selected the one created by someone on our team.
If your CSV lists a publication with a DOI but no URL, construct a URL
in a new column in the CSV before proceeding: https://www.doi.org/<doi>
Excel code: ="https://www.doi.org/" & <doi_cell>
Finally, your CSV file should have the minimum required fields:
title
-- title of the publicationdataset
-- a list of links from https://github.com/NYU-CI/RCDatasets/datasets.jsonoriginal
-- full metadata extracted from the CSVRemove any entries that don't have these fields.
Use the scripts/publications_export_template.py
script to generate a
JSON file to add to the partitions/
directory.
Navigate to your subdirectory in RichContextMetadata/metadata
where your CSV is stored
Copy the directory name where your CSV is located, and Copy the file name of the CSV you want to export
Execute python scripts/publications_export_template.py <directory_name> <csv_file_name>
on the terminal, or on your favorite IDE.
If you want to specify your own filename for the JSON partition, add it as a third argument, e.g:
python scripts/publications_export_template.py <directory_name> <csv_file_name> <json_file_name>
/partitions
If you run into any problems with the template, post a GitHub issue on this repo
Check the RCPublications/partitions
subdirectory after the script is
done running without errors, to make sure that the JSON files has the
required fields and was exported properly.
Since our team is generally working on different partitions in parallel,
often you'll need to rebase
prior to creating a pull request.
In other words,
git rebase master
git push -f origin
Sometimes there may be merge conflicts, which you'll need to fix manually before you can continue. See this Git rebase tutorial for more details.
Run the unit tests on your new JSON file partition prior to commit:
python test.py partitions/20190717_usda_wic_publications.json