DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
138 stars 18 forks source link

multiprocessing.pool.RemoteTraceback: KeyError: 'X0022' #25

Closed yuanf28 closed 1 year ago

yuanf28 commented 1 year ago

Hi, I'm trying to work with coronavirus. But after it creates the reference folder 01 - 03, it gives me the error I'm attaching:

read2tree --standalone_path /home/yuanfa/cov_tree/marker_genes/ --output_path /home/yuanfa/cov_tree/tree_mapping/ --reference

--- Load OGs with min 0 species from oma /home/yuanfa/cov_tree/marker_genes - mode = marker_genes --- Loading files for pre-filter: 100%|███████████████████████████| 8/8 [00:00<00:00, 6103.03 OGs/s] 2023-05-06 04:13:48,081 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- Loading OGs: 0%| | 0/8 [00:00<?, ? OGs/s]2023-05-06 04:13:49,729 - read2tree.OGSet - WARNING - This OG OG20 did not have any DNA Loading OGs: 12%|█████▊ | 1/8 [00:01<00:11, 1.65s/ OGs]2023-05-06 04:13:51,349 - read2tree.OGSet - WARNING - This OG OG1 did not have any DNA Loading OGs: 25%|███████████▌ | 2/8 [00:03<00:09, 1.63s/ OGs]2023-05-06 04:13:52,816 - read2tree.OGSet - WARNING - This OG OG14 did not have any DNA Loading OGs: 38%|█████████████████▎ | 3/8 [00:04<00:07, 1.56s/ OGs]2023-05-06 04:13:54,176 - read2tree.OGSet - WARNING - This OG OG57 did not have any DNA Loading OGs: 50%|███████████████████████ | 4/8 [00:06<00:05, 1.48s/ OGs]2023-05-06 04:13:56,280 - read2tree.OGSet - WARNING - This OG OG93 did not have any DNA Loading OGs: 62%|████████████████████████████▊ | 5/8 [00:08<00:05, 1.70s/ OGs]2023-05-06 04:13:58,239 - read2tree.OGSet - WARNING - This OG OG28 did not have any DNA Loading OGs: 75%|██████████████████████████████████▌ | 6/8 [00:10<00:03, 1.79s/ OGs]2023-05-06 04:13:59,660 - read2tree.OGSet - WARNING - This OG OG25 did not have any DNA Loading OGs: 88%|████████████████████████████████████████▎ | 7/8 [00:11<00:01, 1.67s/ OGs]2023-05-06 04:14:01,120 - read2tree.OGSet - WARNING - This OG OG84 did not have any DNA Loading OGs: 100%|██████████████████████████████████████████████| 8/8 [00:13<00:00, 1.63s/ OGs] 2023-05-06 04:14:01,121 - read2tree.OGSet - INFO - : Gathering of DNA seq for 8 OGs took 13.039867639541626. --- Generating reference for mapping --- Loading records: 100%|████████████████████████████████████| 8/8 [00:00<00:00, 34204.31 record/s] 2023-05-06 04:14:01,123 - read2tree.ReferenceSet - INFO - : Extracted 11 reference species form 8 ogs took 0.0012955665588378906 --- Alignment of 8 OGs --- /home/yuanfa/miniconda3/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /home/yuanfa/miniconda3/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 292, in _align_worker align.dna = self._get_translated_alignment(codons, alignment) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 216, in _get_translated_alignment codon = codons[rec.id] KeyError: 'X0022' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/yuanfa/miniconda3/envs/r2t/bin/read2tree", line 16, in main(sys.argv[1:], exe_name=exe_name(), desc=desc) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 291, in main alignments = Aligner(args, ogset.ogs, load=True) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 51, in init self.alignments = self._align(og_set) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 330, in _align res_align = p.map(self._align_worker, og_chunks) File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/yuanfa/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value KeyError: 'X0022'

Is there something to do with this specific line in the Aligner.py script?

sinamajidian commented 1 year ago

Dear @yuanf28

Thanks for reaching out.

I assume that you used the corona oma browser to download the gene markers using this link as mentioned in the wiki. (If this is not the case, please let us know how you did that.)

After this step, the cdna of the genes should be downloaded from here and unzipped. Then, one can run read2tree with

read2tree --standalone_path marker_genes --output_path output  --dna_reference  viruses.cdna.fa   --reference

(Some backend information: without mentioning --dna_reference, read2tree tries to download the cdna of each protein in the marker_genes folder from the main oma browser using Rest API. However, the cornona oma browser is different and read2tree doest use corona api. Thus, the user should provide the cnda file as input.)

Sorry for the inconvenience, I should have mentioned this in the wiki before. And thanks to your comment, we just updated the wiki.

Please let us know whether this works for you or not.

Best regards, Sina

yuanf28 commented 1 year ago

Thank you for your prompt response. I will test it as soon as possible to see if it works properly.

yuanf28 commented 1 year ago

@sinamajidian This works for me. Thanks a lot.