adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

no output generated #74

Closed csgrant00 closed 1 year ago

csgrant00 commented 1 year ago

Describe the bug Trying to run on Copernicus and not getting any output

To Reproduce python run.py -p "/proj/ads/abstracts/data/EGU/EGU.101623/wes-8-1625-2023.xml" -t jats -f copernicus.110323

Additional context log file complaint: 'NoneType' object has no attribute 'find'

{"asctime": "2023-11-03T18:32:31.150Z", "name": "manual-parser", "processName": "MainProcess", "filename": "run.py", "funcName": "main", "levelname": "WARNING", "lineno": 145, "module": "run", "threadName": "MainThread", "message": "Error parsing record (/proj/ads/abstracts/data/EGU/EGU.101623/wes-8-1625-2023.xml): 'NoneType' object has no attribute 'find'", "timestamp": "2023-11-03T18:32:31.150Z", "hostname": "adsnest.cfa.harvard.edu"}

seasidesparrow commented 1 year ago

The problem is primarily with ADSManualParser here. One issue is that the current production directory ingest/ADSManualParser is behind the current github main @HEAD. Another issue is that the content type is copernicus rather than jats. This will be resolved when I rebuild the production directory's virtual environment (Monday Nov 6, 2023).

seasidesparrow commented 1 year ago

You can now get a record using python run.py -p "/proj/ads/abstracts/data/EGU/EGU.101623/wes-8-1625-2023.xml" -t copernicus -f copernicus.110323. Note that the pagination is not being captured correctly, which is a bug in the copernicus parser.