Closed seasidesparrow closed 11 months ago
I would discuss this with @aaccomazzi since he was the one explaining to me that the input list needs to contain both arXiv and earth science. Your design distinguish these and you have to have a list for arXiv and list for earth science to get processed.
I'm pretty sure we had this conversation a few months back and from my perspective nothing has changed, so let me restate what I think we should do:
If I missed/misunderstood something let's have a chat.
@aaccomazzi I did not know that the classic format can correctly read arXiv metadata. Then we should remove the arXiv parser and use the classic reader for all input files this way there is no need to identify what kind the input metadata file is. That should solve the issue. Thank you. @seasidesparrow
@seasidesparrow I have time today, I can verify that classic parser can read the arXiv file and extract all the information that arXiv parser extracts. Let me know if you want me to do that.
I would need to make some changes to the classic side to have it output paths to the .abs files instead of to the xml files and test it, so this isn't something I would want to deploy before next Tuesday at the earliest.
@seasidesparrow Let me check it out. I shall let you know what I found out.
@seasidesparrow I went ahead implemented this and made a release, if you would please check it out. https://github.com/adsabs/ADSDocMatchPipeline/releases/tag/v3.1.4 Now you can include earth science records among the arXiv records and submit them to pipeline to get matched with publication. Or included them among the pub records and submit them to get matched with arXiv records. You can also submit them separately, for example if earth science records come as eprints during the weekend, you can create eprint.input list and process them then, the same if they come in as pub records at any time, you can create pub.input and process them at once. Very flexible. Please let me know if there is any issue. thank you. @aaccomazzi
Closed: superseded
I think it comes down to deciding whether to have the parser doing system-level logic, or whether the parser acts based upon what the controlling code tells it to do. In this case, the pathnames are definite only at this moment in time, but we're in a transition period where the underlying system architecture or flow control may change. If we use a calling option like what I've implemented to specify that logic, we're not reliant on the current architecture, and that would be my preference.