OUP harvesting - Githubissues

Harvesting: Add OUP publisher files fetching DAG. Important information for a task solution:

Harvesting steps:

Fetch data from FTP (pdf, xml, pdfa). All of them are in separate zips P.S. really similar to Springer and IOP Save it in s3 Download from s3 Split XML (downloaded XML might consists of more than one article. Split means, one smaller XML - one article) Trigger runs of processing DAG. One run has to be triggered with one article. The response is XML, use ElementTree as a parsing lib Expected behavior : Input: XML doc., which might consists of more than one article. Output: trigger files processing DAG

Tests are mandatory Important: verify with Anne

cern-sis / issues-scoap3

OUP harvesting #93