Currently, Springer publisher workflows don't work properly: the test data structure is not adapted to real data, and the processing is triggered by time as a cron job, rather than triggering the pull_sftp DAG...
As a result, we cannot use Springer workflows as a working component, in order to harvest files in QA.
Solutions:
[x] Tigger Springer file processing just when from the Springer pull_ftp_dag. Currently, the tag is triggered independently from the pull_ftp_dag
[x] QA: Change the data pull from real Springer SFTP, rather than from our local SFTP service
[x] Change to the structure of files in test data, the folder structure has to look like this: springer/EPJC , springer/JHEP. Currently, all files are taken from one springer folder.
[x] Adapt tests and code to the new file structure, mentioned above
[x] ~use known_hosts file in order to connect to Springer SFTP: adapt code, upload, and copy known_hosts file~ We agreed to leave it as it is. Maybe later focus on it when we will move to more detailed improvement
[x] Pull force from sftp fix: now it calls the processing twice, when we are trying to pull force
Currently, Springer publisher workflows don't work properly: the test data structure is not adapted to real data, and the processing is triggered by time as a cron job, rather than triggering the pull_sftp DAG...
As a result, we cannot use Springer workflows as a working component, in order to harvest files in QA.
Solutions: