adsabs / ADSDocMatchPipeline

Pipeline to match publisher document with preprint counterpart and vice versa
MIT License
1 stars 4 forks source link

Content/oracledb: ADS curated `matches.kill` file cannot be added to oracle #35

Open seasidesparrow opened 4 months ago

seasidesparrow commented 4 months ago

The file abstracts/sources/ArXiv/published/matches.kill is used by classic to indicate whether a specific preprint and published paper should not be combined. In oracle db, this relationship is given a score of -1 ("incorrect").

There are a handful of cases where we list multiple published papers for one specific preprint, for example:

2012arXiv1206.2395F     2012xrb..confE..17F
2012arXiv1206.2395F     2013hcxa.confE..79F

If you use the following command to try and add this file to oracledb, you get a 400 error: python3 run.py -mf /proj/ads/abstracts/sources/ArXiv/published/matches.kill -as incorrect