It is important to closely monitor the state of ingest related data stores, especially Solr. This repo holds code that daily gathers the list of canonical bibcodes and current bibcodes in Solr to compute what is missing, what is new, what is deleted, etc.
To gather all the needed data and compute state:
python run.py --gather --compute
Errors are defined in the config file
Results will only change if the pipeline has processed all.links since the last AIR
/proj/ads/abstracts/config/links/fulltext/all.links
This directory structure needs to exist for files to be stored:
data
└── ft
├── Errno_2_No_such_file_or_directory
├── extraction_failed_for_bibcode
├── format_not_currently_supported_for_extraction
├── is_linked_to_a_non_existent_file
└── is_linked_to_a_zero_byte_size_file
Matthew Templeton, NASA ADS
Originally written by Steve McDonald, NASA ADS