cern-sis / issues-scoap3

0 stars 0 forks source link

APS and Hindawi records reprocessing and force pulling #119

Open ErnestaP opened 1 year ago

ErnestaP commented 1 year ago

The API publishers are harvested by date. Different from SFTP source publishers, there are many variations of bunches of articles. Dilemma: how we should force harvest records? One by one by using DOIs? APS and Hindawi APIs have the option to harvest one article by using DOI, but not a group of dois. How we should reprocess them if they are already downloaded?

Now, we are saving the original response from API in a file (xml or json), later splitting into separated records and sending it for processing files DAG. The separate records ARE NOT saved in individual files.