DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
138 stars 18 forks source link

merge-all-mappings fails #65

Closed evo-eco-gen closed 1 week ago

evo-eco-gen commented 2 weeks ago

Hi,

I have a dataset of 35 rodents plus OMA markers for 5 rodents. Everything works fine for individual mapping, but then --merge_all_mappings fails. After successfully adding 24 of my samples, suddenly it says something about adding a sample called "M": this makes no sense, as all samples used have generic GenBank SRA names (SRR and ERR). I have no sample or any other file called "M", but suddenly M_all_sc.txt and M_all_cov.txt are generated. Then I get the error (full log attached). Any suggestions will be welcome!

############## 2024-07-10 12:27:57,507 - read2tree.main - INFO - --- Addition of M to all ogs --- --- Retrieve mapped consensus sequences --- Loading consensus read mappings : 100%|████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 287.19 species/s] --- Add inferred mapped sequence back to OGs --- Adding mapped seq to OG: 100%|███████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 46510.36 OGs/s] 2024-07-10 12:27:57,957 - read2tree.OGSet - INFO - merge: Appending 500 reconstructed sequences to present OG took 0.013076066970825195. --- Add inferred mapped sequence back to alignment --- Adding mapped seq to alignments: 0%| | 0/500 [00:00<?, ? alignments/s] Traceback (most recent call last): File "/media/data1/kozakk/Microtus/r2t/bin/read2tree", line 4, in import('pkg_resources').run_script('read2tree==0.1.5', 'read2tree') File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/pkg_resources/init.py", line 748, in run_script self.require(requires)[0].run_script(script_name, ns) File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/pkg_resources/init.py", line 1726, in run_script exec(script_code, namespace, namespace) File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/EGG-INFO/scripts/read2tree", line 16, in File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/main.py", line 389, in main File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 168, in add_mapped_seq File "/media/data1/kozakk/Microtus/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 106, in _add_mapseq_align AttributeError: 'NoneType' object has no attribute 'seq'

############

mplog.log

evo-eco-gen commented 2 weeks ago

A comment after some more sleuthing on my own: the label "M" must be a result of the original GenBank submitter messing up something in a file. Anyway, deleting all files and folders (04, 05, etc) related to this mystery M sample removed the problem. (I also threw out two samples with just a handful of genes mapped).

sinamajidian commented 1 week ago

Hi @evo-eco-gen

It seems you managed to solve the issue. Please let us know if you have any other questions.

Best, Sina

evo-eco-gen commented 1 week ago

Hi, I did have some more problems with merging (specifically results from different nodes). One suggestion: it may be helpful for the merge routine to specify which sample is causing problems, e.g. if a file is missing for merging.