EBI-Metagenomics / EukCC

Tool to estimate genome quality of microbial eukaryotes
GNU General Public License v3.0
31 stars 9 forks source link

ValueError: 13349 taxid not found #31

Closed mkellom closed 2 years ago

mkellom commented 2 years ago

What could be causing this error? Here is the log output, thank you:

09-03-2022 12:18:52: EukCC version 2.0 09-03-2022 12:18:52: Found 182 bins 09-03-2022 12:39:44: Searching for marker genes in base database 09-03-2022 12:39:45: No placement marker genes found. 09-03-2022 12:39:45: Searching for marker genes in base database 09-03-2022 12:39:46: No placement marker genes found. 09-03-2022 12:39:46: Searching for marker genes in base database 09-03-2022 12:39:47: Found 4 marker genes, placing them in the tree using epa-ng Traceback (most recent call last): File "/global/homes/m/mkellom/.conda/envs/assemble/bin/eukcc", line 10, in sys.exit(main()) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/main.py", line 413, in main eukcc_folder(args) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/refine.py", line 65, in eukcc_folder refine(state) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/refine.py", line 153, in refine bins.append(bin(state, wd, path, protein=True)) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/bin.py", line 22, in init self.run_eukcc() File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/bin.py", line 42, in run_eukcc clade = E.determine_subdb() File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/eukcc.py", line 359, in determine_subdb lng = tax_LCA(tree, info, places) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/treehandler.py", line 138, in tax_LCA info = load_tax_info(taxinfo) File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/eukcc/base.py", line 60, in load_tax_info d[row[1]] = [str(x) for x in ncbi.get_lineage(row[0])] File "/global/homes/m/mkellom/.conda/envs/assemble/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 238, in get_lineage raise ValueError("%s taxid not found" %taxid) ValueError: 13349 taxid not found

thackl commented 2 years ago

Same problem here...

23-03-2022 05:29:50:  Found 18 marker genes, placing them in the tree using epa-ng
Traceback (most recent call last):
  File "/home/thackl/software/anaconda3/envs/eukcc/bin/eukcc", line 10, in <module>
    sys.exit(main())
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/eukcc/__main__.py", line 416, in main
    run_eukcc(args)
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/eukcc/__main__.py", line 74, in run_eukcc
    clade = E.determine_subdb()
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/eukcc/eukcc.py", line 359, in determine_subdb
    lng = tax_LCA(tree, info, places)
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/eukcc/treehandler.py", line 138, in tax_LCA
    info = load_tax_info(taxinfo)
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/eukcc/base.py", line 60, in load_tax_info
    d[row[1]] = [str(x) for x in ncbi.get_lineage(row[0])]
  File "/home/thackl/software/anaconda3/envs/eukcc/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 241, in get_lineage
    raise ValueError("%s taxid not found" %taxid)
ValueError: 2478750 taxid not found
openpaul commented 2 years ago

This seems to be related with issues in ETE, so I assume providing a ete database will solve this issues for everyone. So I will set this higher on my todo list.

Thank you for reporting.

In the meantime this might be solved by updating the NCBI taxid database as outline here: http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html

This is not an error produced by EukCC, but I guess it should handle it more gracefully.

openpaul commented 2 years ago

Please see this comment for a solution: https://github.com/Finn-Lab/EukCC/issues/30#issuecomment-1086230673

I am testing a new release with a ete database provided by me