[x] Not reprocess articles, if they are already in the repo (moved to a different task: requires discussion )
[x] Add functionality to reprocess already downloaded files, by triggering the 'trigger_files_processing' task with s3 keys which are responding to articles moved to a different task: requires discussion )
[x] Hindawi wrongly parses the affiliation country for USA, but not for different countries
[x] ~publication_info - new workflows don't have the publication_info.page_start, but we have in heprawl (however, not in other publishers in the workflow). Cannot find the exact place in code, look like it is taken from <article-id pub-id-type="publisher-id"> nad where is a pubnote?~ (moved to different https://github.com/cern-sis/issues-scoap3/issues/124)
[x] copyright.year string or int? In some publishers (such as APS) we have int, in Hindawi we have a string.
[x] no collections
[x] ~do we need raw_name in authors?~ (moved to different issue)
Wanted functionality: to re-harvest and reprocess already downloaded articles in the error situations/code changes...
Not reprocess articles, if they are already in the repo(moved to a different task: requires discussion )Add functionality to reprocess already downloaded files, by triggering the 'trigger_files_processing' task with s3 keys which are responding to articlesmoved to a different task: requires discussion )<article-id pub-id-type="publisher-id">
nad where is a pubnote?~ (moved to different https://github.com/cern-sis/issues-scoap3/issues/124)