CenterForOpenScience / scrapi

A data processing pipeline that schedules and runs content harvesters, normalizes their data, and outputs that normalized data to a variety of output streams. This is part of the SHARE project, and will be used to create a free and open dataset of research (meta)data. Data collected can be explored at https://osf.io/share/, and viewed at https://osf.io/api/v1/share/search/. Developer docs can be viewed at https://osf.io/wur56/wiki
Apache License 2.0
41 stars 45 forks source link

Document Type Support #521

Open wearpants opened 8 years ago

wearpants commented 8 years ago

Adds preliminary support for a documentType field for crossref & plos harvesters

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.2%) to 93.642% when pulling 536cd97b4c62507e9e19b3bc0d00c063d6d7ce2b on wearpants:feature/document-type into 05282578751f52e3c955d0b190c5628dd221389a on CenterForOpenScience:develop.

wearpants commented 8 years ago

Pull request with initial support for a documentType field for crossref and plos. See SHARE-294 in Jira. /cc @chrisseto It supports the following values: "article", "abstract", "dataset", "book", "book-chapter", "dissertation", "correction", "preprint", "source-code", "clinical-trial", "reference-entry", "monograph"

To update production, we'll need to renormalize after merging: invoke migrate renormalize -s 'crossref,plos' --no-dry. I estimate this will take about 23 machine hours - may be able to get some speedup by using more celery tasks, but not 100% sure where the bottleneck is. Be sure to merge branch on all harvester machines before running this migration.

@alexschiller if you'd like to start adding frontend support, checkout the feature/document-type branch from my fork: https://github.com/wearpants/scrapi/ ; You can either renormalize an existing database or re-run the crossref & plos harvesters to generate new data. The OSF web server should pass through documentType field in share_search API results.

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.2%) to 93.642% when pulling f0941bd28685b105c7455fda25600cc9351011aa on wearpants:feature/document-type into 05282578751f52e3c955d0b190c5628dd221389a on CenterForOpenScience:develop.