biologger / speciesprimer

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems.
GNU General Public License v3.0
40 stars 19 forks source link

AnnotRptUrl DTD Error #30

Closed vupadhyay-code closed 2 months ago

vupadhyay-code commented 3 months ago

Hey I'm running species primer on a server. I've gotten it to work in the past but they did an OS update so I had to reinstall. I've got to use the apptainer tool. Basically I create an overlay that is writable and then bind relevant volumes to launch the container. Everything looks right, but then I run into the following error:

Start searching primer for Anaerostipes_hadrus

Warning synonyms for this species were found... ['Eubacterium hadrum', 'Anaerostipes sp. 5/1/63FAA', 'Clostridiales bacterium SSC/2', 'butyrate-producing bacterium SS2/1', 'butyrate-producing bacterium SSC/2'] Adding synonyms to exception in config.json.

fatal error while working on Anaerostipes_hadrus check logfile /new_speciesprimer/speciesprimer_2024_08_30.log Failed to find tag 'AnnotRptUrl' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. Traceback (most recent call last): File "/pipeline/speciesprimer.py", line 4168, in main run_pipeline_for_target(target, config) File "/pipeline/speciesprimer.py", line 4054, in run_pipeline_for_target newconfig = DataCollection(config).collect() File "/pipeline/speciesprimer.py", line 650, in collect self.get_ncbi_links(taxid) File "/pipeline/speciesprimer.py", line 276, in get_ncbi_links genomedata = collect_genomedata(taxid, email) File "/pipeline/speciesprimer.py", line 223, in collect_genomedata assembly_records = Entrez.read(assembly_efetch) File "/usr/local/lib/python3.5/dist-packages/Bio/Entrez/init.py", line 478, in read record = handler.read(handle) File "/usr/local/lib/python3.5/dist-packages/Bio/Entrez/Parser.py", line 334, in read self.parser.ParseFile(handle) File "../Modules/pyexpat.c", line 414, in StartElement File "/usr/local/lib/python3.5/dist-packages/Bio/Entrez/Parser.py", line 571, in startElementHandler raise ValidationError(tag) Bio.Entrez.Parser.ValidationError: Failed to find tag 'AnnotRptUrl' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. Error report: for target Anaerostipes_hadrus Error 1: fatal error while working on Anaerostipes_hadrus check logfile /new_speciesprimer/speciesprimer_2024_08_30.log

Any suggestions?

biologger commented 3 months ago

Hi,

This seems to be linked to Issue 27, NCBI changed parts of the ENTREZ database interface. Currently only v2.1.3 of the pipeline and the corresponding docker container includes a fix for this issue and can be used to download genomes from NCBI.

Since today the "latest" tag for the docker container links to v2.1.3, alternatively use

docker pull biologger/speciesprimer:v2.1.3