DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
144 stars 18 forks source link

(Error) KeyError: 'BETVV' #18

Closed shelkmike closed 1 year ago

shelkmike commented 1 year ago

I encountered an error when running Read2tree. How to reproduce: 1) Download the gene set for Pentapetalae from Omabrowser, setting "Minimum fraction of covered species" to 0.8 and "Maximum nr of markers" to -1. 2) Run Read2tree 0.1.4 with the command read2tree --standalone_path marker_genes/ --output_path Read2tree_output --reference --threads 50

Read2tree terminates in the middle, producing the following error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/mschelkunov/Work/Tools/Conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/mschelkunov/Work/Tools/Conda/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/mschelkunov/Work/Tools/Conda/lib/python3.9/site-packages/read2tree-0.1.3-py3.9.egg/read2tree/Aligner.py", line 292, in _align_worker align.dna = self._get_translated_alignment(codons, alignment) File "/home/mschelkunov/Work/Tools/Conda/lib/python3.9/site-packages/read2tree-0.1.3-py3.9.egg/read2tree/Aligner.py", line 216, in _get_translated_alignment codon = codons[rec.id] KeyError: 'BETVV' """

sinamajidian commented 1 year ago

Dear @shelkmike Sorry for late reply. I just installed read2tree from the newest commit on the github. I downloaded the maker genes from oma browser for Pentapetalae with Minimum fraction of covered species as 0.8 and Maximum nr of markers as -1. Then I run the following

read2tree --standalone_path marker_genes/ --output_path Read2tree_output --reference
--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes ---
Loading files for pre-filter: 100%|█████████████████████████████████████████████████████████████████████████| 4297/4297 [00:39<00:00, 108.52 OGs/s]
2023-02-08 13:55:39,476 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api ---
Loading OGs:  66%|███████████████████████████████████████████████████████████▉                               | 2833/4297 [40:18<22:10,  1.10 OGs/s]2023-02-08 14:36:58,425 - read2tree.OGSet - WARNING - This OG OG844937 did not have any DNA
Loading OGs:  86%|██████████████████████████████████████████████████████████████████████████████▎            | 3696/4297 [53:52<08:18,  1.21 OGs/s]
2023-02-08 14:58:13,904 - read2tree.OGSet - INFO - : Gathering of DNA seq for 4297 OGs took 3754.426855325699.
--- Generating reference for mapping ---
Loading records: 100%|█████████████████████████████████████████████████████████████████████████████████| 4297/4297 [00:00<00:00, 33471.24 record/s]
2023-02-08 14:58:14,033 - read2tree.ReferenceSet - INFO - : Extracted 43 reference species form 4297 ogs took 0.12882447242736816
--- Alignment of 4297 OGs ---

But I didn't get such error. Could you possibly try it again? It would be great if you could send us the full output along with the log file. I would also suggest you running the test example first.

Regards, Sina

shelkmike commented 1 year ago

I tried Read2tree on two computers, one with CentOS 6.5 and one with Ubuntu 22.04, and got the same error. I attached the output. output.txt

sinamajidian commented 1 year ago

Thanks for sending the output. I was able to reproduce the error and we'll update you soon.

sinamajidian commented 1 year ago

I'm so sorry for the inconvenience. There was an issue with the API call for downloading the cDNA. A colleague of mine @alpae worked on it and the error that we faced previously is resolved now. The following finishes successfully read2tree --standalone_path marker_genes/ --output_path Read2tree_output --reference But I would like to mention that read2tree provides the MSA including all species of input sequencing reads in addition to species of the marker genes. This is then used by the tree inference method (IQTree). Note that the more marker genes you have, the higher resolution you might get (up to saturation point) but IQtree will be slower.

Would be happy to hear that the issue is resolved in your side as well.

shelkmike commented 1 year ago

This solved the issue. Thank you.