Trying to process articles scripts/cron/cron.py from daily updates. Last added pubmed23n1223.xml.gz.
2023-02-22 13:24:39.000 | INFO | classifier_pipeline.pubmed:prediction_print_spy:307 - Identified 99 hits from 26534 tested (0.373%);mean probability: 0.994 --- pmid: 34561789; prob=0.996
2023-02-22 13:24:39.000 | INFO | classifier_pipeline.pubmed:prediction_print_spy:307 - Identified 100 hits from 26784 tested (0.373%);mean probability: 0.994 --- pmid: 34989308; prob=0.996
2023-02-22 13:24:39.000 | INFO | classifier_pipeline.pubmed:_pmc_supplement_transfomer:277 - Retrieving 34 PMC IDs
Traceback (most recent call last):
File "cron.py", line 56, in <module>
pipeline = as_pipeline(
File "/home/baderlab/Documents/dev/classifier-pipeline/classifier_pipeline/utils.py", line 15, in as_pipeline
generator = step(generator)
File "/home/baderlab/Documents/dev/classifier-pipeline/classifier_pipeline/utils.py", line 126, in exhaust
deque(generator, maxlen=0)
File "/home/baderlab/Documents/dev/classifier-pipeline/classifier_pipeline/utils.py", line 118, in _db_loader
for item in items:
File "/home/baderlab/Documents/dev/classifier-pipeline/classifier_pipeline/pubmed.py", line 279, in _pmc_supplement_transfomer
for pubmed_chunk in pubmed_chunks:
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/ncbiutils.py", line 158, in get_citations
citations = self._parse_response(response.content)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/ncbiutils.py", line 143, in _parse_response
return self._parse_xml(data)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/ncbiutils.py", line 138, in _parse_xml
return list(records)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/pmcxmlparser.py", line 162, in parse
journal = self._get_journal(pmc_article)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/pmcxmlparser.py", line 118, in _get_journal
iso_abbreviation = self._get_iso_abbreviation(pmc_article)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/pmcxmlparser.py", line 64, in _get_iso_abbreviation
text = _collect_element_text(isoabbrev)
File "/home/baderlab/miniconda3/envs/pipeline/lib/python3.8/site-packages/ncbiutils/xml.py", line 50, in _collect_element_text
return ' '.join(element.xpath('string()').split())
AttributeError: 'NoneType' object has no attribute 'xpath'
Trying to process articles
scripts/cron/cron.py
from daily updates. Last addedpubmed23n1223.xml.gz
.