cern-sis / issues-scoap3

0 stars 0 forks source link

Migrate OUP parser to AirFlow: parsing #96

Open ErnestaP opened 1 year ago

ErnestaP commented 1 year ago

Add OUP publisher files processing DAG. Important information for a task solution:

The DAG run has to be triggered by one article, coming from fetching DAG (https://github.com/SCOAP3/scoap3-next/issues/378) For XML parsing we will use ElementTree As an example, we can use previous version Also, we have to consider the following issues while we develop the parsing: https://github.com/cern-sis/issues-scoap3/issues/14 Expected behavior: Input: XML documents Output - multiple JSONs, one article each.

Acceptance

Generate a valid record output and give the results to Anne to check that this is what we need. Tests are mandatory