Add option to harvest article by DOI

details and examples: APIs for harvesting by doi: You need only to pass the doi in the URL Hindawi: https://www.hindawi.com/oai-pmh/oai.aspx?verb=getrecord&identifier=oai:hindawi.com:10.1155/2023/8127604&metadataprefix=oai_dc APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.231901

Elsevier: Elsevier is harvested from SFTP. The files that it has there are zip and tar. Harvesting by doi: IF THE ARTICLE IS IN SFTP, we can read the content of zip/tar and take only the article we need. It should not be difficult, since Elsevier has mapping where in zip/tar files, the articles are located. IF THE ARTICLE IS NOT IN SFTP (older zips/tars are deleted) we can re-process the articles that we already have in our s3, but for some reason not in the repo. Need to verify if the naming of saved articles reflects/can reflect the DOI.

OUP: Is harvested from FTP. The files that it has there are zip. They should be deleted from SFTP after harvesting because OUP uploads the updates with the same names, so it means they would overwrite the old files with the changes (which could be new articles, updates of the previous articles, etc.) Harvesting by doi: we should re-process the articles that we already have in our s3, but for some reason are not in the repo.Since the articles are already deleted from SFTP after the first harvest. If the articles were never harvested before, so they should be in SFTP. If they are not there, ask OUP to upload them

IOP: Is harvested from SFTP. The files that it has there are zip. Harvesting by doi: IF THE ARTICLE IS IN SFTP. As OUP it also has the all locations of files written in the mapping. However, this time the mapping is in txt. We can read the mapping and download only the articles we need. IF THE ARTICLE IS NOT IN SFTP (older zips/tars are deleted) we can re-process the articles that we already have in our s3. Need to verify if the naming of saved articles reflects/can reflect the DOI.

Springer Is harvested from SFTP. Harvesting by doi: doesn't have any mapping. If we don't have this article at all, we will need to harvest all the zips from Springer SFTP, which ones are not in our s3. If we have the article already in s3, but for some reason is not in the repo, we can re-process it again.

cern-sis / issues-scoap3

Add option to harvest article by DOI #277