CenterForOpenScience / scrapi

A data processing pipeline that schedules and runs content harvesters, normalizes their data, and outputs that normalized data to a variety of output streams. This is part of the SHARE project, and will be used to create a free and open dataset of research (meta)data. Data collected can be explored at https://osf.io/share/, and viewed at https://osf.io/api/v1/share/search/. Developer docs can be viewed at https://osf.io/wur56/wiki
Apache License 2.0
41 stars 45 forks source link

[dataONE] #477

Closed MerlinZhang closed 8 years ago

MerlinZhang commented 8 years ago

Fixed dataONE to stop harvesting results with no information Included conditional statement to collect only metadata, no data or resources to fix the problem

erinspace commented 8 years ago

A note - we'll have to renormalize everything from dataone to put this into full effect which will cut down quite a bit on the total number of documents in SHARE.

We'll deploy this now so that it applies to all future harvests.