Closed kodingkoning closed 8 months ago
Hi @kodingkoning Thanks for using read2tree.
I'm wondering whether you tested your installation with example dataset that is available in our github? It would be great if you could provide us with full command line that you ran. And could you possibly tell us how you obtained marker genes?
Best regards, Sina
Hi @sinamajidian
I'm having the same trouble. In fact, I'm guessing that the reason for not being able to complete the comparison is that Cdna of OGs can't be downloaded from the OMA browser. Because I also got these warnings:
--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes --- 2023-11-06 14:36:45,808 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- 2023-11-06 14:37:22,040 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE22870_OG843400. The reason is HTTPSConnec tionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolution Error("<urllib3.connection.HTTPSConnection object at 0x2afa81bb81f0>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or se rvice not known)")) 2023-11-06 14:37:32,067 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE20480_OG1137105. The reason is HTTPSConne ctionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolutio nError("<urllib3.connection.HTTPSConnection object at 0x2afa8131fb80>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or s ervice not known)")) 2023-11-06 14:37:42,087 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE08776_OG961357. The reason is HTTPSConnec tionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolution Error("<urllib3.connection.HTTPSConnection object at 0x2afa81bb8e20>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or se rvice not known)")) 2023-11-06 14:37:52,108 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE19700_OG1242666. The reason is HTTPSConne ctionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolutio nError("<urllib3.connection.HTTPSConnection object at 0x2afa81bb9780>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or s ervice not known)"))
How do I fix this?
Best wishes, Sly
Dear Sly, thanks for contacting us.
I think this case is a bit different. I'm sorry yesterday we had some issue with our server. Read2tree code tries to open this link via API. It is working now. Could you give it another try?
You could also concatenate all the fna files in the in your marker_gene
folder and use it with option --dna_reference
. Then read2tree won't download them.
cat marker_gene/*.fna > ref_dna.fa
read2tree .. --dna_reference ref_dna.fa
Please let us know the results.
Hi @sinamajidian Thank you for your reply. I tried again with the test data and successfully obtained the results. However, when I used my own data for analysis, I received some new errors:
--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes --- Loading files for pre-filter: 100%|███████████████████████████████████████████████████████████████| 2196/2196 [00:03<00:00, 614.43 OGs/s] 2023-11-08 09:22:44,673 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- Loading OGs: 100%|███████████████████████████████████████████████████████████████████████████████| 2196/2196 [1:02:48<00:00, 1.72s/ OGs] 2023-11-08 10:25:33,136 - read2tree.OGSet - INFO - 44: Gathering of DNA seq for 2196 OGs took 3768.461127281189. --- Generating reference for mapping --- Loading records: 100%|██████████████████████████████████████████████████████████████████████| 2196/2196 [00:00<00:00, 118696.01 record/s] 2023-11-08 10:25:33,156 - read2tree.ReferenceSet - INFO - 44: Extracted 6 reference species form 2196 ogs took 0.019290685653686523 --- Alignment of 2196 OGs --- /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) 2023-11-08 10:25:42,693 - read2tree.wrappers.aligners.mafft - WARNING - is MAFFT_BINARIES set correctly: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 287, in _align_worker alignment = mafft_wrapper() File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/wrappers/aligners/mafft.py", line 107, in call raise WrapperError('Mafft did not compute any alignments. StdErr: {}'.format(error)) read2tree.wrappers.WrapperError: Mafft did not compute any alignments. StdErr: outputhat23=16 treein = 0 compacttree = 0 Warning: Only 0 sequence found. minimumweight = 0.000010 autosubalignment = 0.000000 nthread = 0 randomseed = 0 blosum 62 / kimura 200 poffset = 0 niter = 16 sueff_global = 0.100000 nadd = 16 Warning: Only 0 sequence found.
Strategy: L-INS-i (Probably most accurate, very slow) Iterative refinement method (<16) with LOCAL pairwise alignment information
If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Share/user/sly20/micromamba/envs/r2t/bin/read2tree", line 16, in
Strategy: L-INS-i (Probably most accurate, very slow) Iterative refinement method (<16) with LOCAL pairwise alignment information
If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.
Nice, we have some progress. Read2tree uses MAFFT to align the marker genes (before using the sequencing reads) and it seems that there is a problem there. I noticed that you have 6 reference species and 2196 OGs. We usually much fewer number of marker genes, when you download from oma browser you can select how many, 150-400 would be a good choice. I would suggest to re-download them by setting Maximum nr of markers
.
Would be great to tell us the set of species that you used, so I can try to reproduce the error.
You could also check mplog.log
log file to see for which OG the error happens.
This is a sample log
2023-11-04 17:36:04,452 - read2tree.ReferenceSet - INFO - : Extracted 21 reference species form 200 ogs took 0.0019800662994384766
2023-11-04 17:36:04,666 - read2tree.Aligner - DEBUG - aligning OG OG1234111 with 18 proteins
2023-11-04 17:36:04,667 - read2tree.wrappers.abstract_cli - DEBUG - Running following command: /software/miniconda/envs/r2t_3.10.8b/bin/mafft --auto --maxiterate 1000 --amino /tmp/34917137/tmpc6q1p2fa
2023-11-04 17:36:06,463 - read2tree.Aligner - DEBUG - aligning OG OG1195247 with 17 proteins
2023-11-04 17:36:06,463 - read2tree.wrappers.abstract_cli - DEBUG - Running following command: /software/miniconda/envs/r2t_3.10.8b/bin/mafft --auto --maxiterate 1000 --amino /tmp/34917137/tmpjfvxvdur
then you could try running mafft separately for that specific OG
mafft --auto --maxiterate 1000 --amino 01_ref_ogs_aa/OG1135874.fa
Mafft error says that there are 0 sequences. Probably one of the OG file is empty. you could try
ls -alhtS marker_genes | tail -n 3
ls -alhtS 01_ref_ogs_aa/ | tail -n 3
ls -alhtS 01_ref_ogs_dna/ | tail -n 3
Anyway, the easiest way would be to use fewer number of OGs.
Hi @sinamajidian
Thank you for your reply. When I adjusted the number of OGs to 200, in single-species mode, I got these results without error.
When I used the multi-species model for the analysis, no species tree file was generated. At the same time, I'm got some errors.
The command is as follows:
read2tree --standalone_path marker_genes/ --output_path output --reference read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Hnjt/Hnjt_1.clean.fq /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/44_Hnjt/Hnjt_2.clean.fq read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Ddtl/Ddtl_1.clean.fq.gz /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Ddtl/Ddtl_2.clean.fq.gz read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Mhtl/Mhtl_1.clean.fq.gz /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Mhtl/Mhtl_2.clean.fq.gz read2tree --standalone_path marker_genes/ --output_path output/ --reference --merge_all_mappings --tree
error message:
Loading records: 0%| | 0/200 [00:00<?, ? record/s] Loading records: 100%|██████████| 200/200 [00:00<00:00, 47782.00 record/s] /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize)
Re-loading files: 0 OGs [00:00, ? OGs/s] Re-loading files: 200 OGs [00:00, 3559.64 OGs/s]
Re-loading references for mapping from folder: 0%| | 0/6 [00:00<?, ? species/s] Re-loading references for mapping from folder: 100%|██████████| 6/6 [00:00<00:00, 297.90 species/s]
Loading alignments : 0 Alignment [00:00, ? Alignment/s] Loading alignments : 172 Alignment [00:00, 1713.75 Alignment/s] Loading alignments : 200 Alignment [00:00, 1702.60 Alignment/s]
Mapping reads to species: 0%| | 0/6 [00:00<?, ? species/s]/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/DIACI_OGs.fa.bam' /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:206: RuntimeWarning: Degrees of freedom <= 0 for slice ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide arrmean = um.true_divide(arrmean, div, out=arrmean, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:198: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount)
Mapping reads to species: 17%|█▋ | 1/6 [58:28<4:52:22, 3508.58s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/ACYPI_OGs.fa.bam'
Mapping reads to species: 33%|███▎ | 2/6 [1:57:36<3:55:27, 3531.77s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/ZOONE_OGs.fa.bam'
Mapping reads to species: 50%|█████ | 3/6 [2:47:43<2:44:36, 3292.06s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/CIMLE_OGs.fa.bam'
Mapping reads to species: 67%|██████▋ | 4/6 [3:39:08<1:47:00, 3210.26s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/PEDHC_OGs.fa.bam'
Mapping reads to species: 83%|████████▎ | 5/6 [4:34:46<54:16, 3256.24s/ species] [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/RHOPR_OGs.fa.bam'
Mapping reads to species: 100%|██████████| 6/6 [5:27:07<00:00, 3217.16s/ species] Mapping reads to species: 100%|██████████| 6/6 [5:27:07<00:00, 3271.22s/ species]
Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s] Adding mapped seq to alignments: 100%|██████████| 200/200 [00:00<00:00, 1186507.50 alignments/s]
Adding mapped seq to OG: 0%| | 0/200 [00:00<?, ? OGs/s] Adding mapped seq to OG: 100%|██████████| 200/200 [00:00<00:00, 1246449.93 OGs/s]
Adding mapped seq to OG: 0%| | 0/200 [00:00<?, ? OGs/s] Adding mapped seq to OG: 100%|██████████| 200/200 [00:00<00:00, 42694.46 OGs/s]
Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s]
Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s]
Traceback (most recent call last):
File "/Share/user/sly20/micromamba/envs/r2t/bin/read2tree", line 16, in
What should I do to fix this error?
Best wishes sly
Thanks Sly for the response. You mentioned that you "got these results without error." Was it the same species that the error happens?
Could you please share with us the full mplog.log
file. It seems that you are re-writing the files. Would be best to start the analysis in a new folder. Otherwise, the results of alignment of two runs (200 gene markers and full one ) gets mixed.
Hi @sinamajidian
Thank you for your reply. I've figured out the reason for the error. First, I didn't name the reads file exactly as you specified (species_R1.fastq/species_R2.fastq) and my file name is as follows:38.物种名_1.clean.fq.gz/38.物种名_2.clean.fq.gz;Second, my reads file names include Chinese fonts. I adjusted the reads file naming for all species and I succeeded.
--- Tree inference --- 2023-11-17 22:29:17,754 - read2tree.TreeInference - INFO - merge: Tree inference took 32.54426026344299. (39Ddtl_R1:0.0673121808,38Mhtl_R1:0.0878371145,(44Hnjt_R1:0.0881589205,((((CIMLE:0.1771613970,RHOPR:0.1652477443):0.1578374932,(ACYPI:0.3422935192,DIACI:0.3035898292):0.0628138094):0.0824901814,PEDHC:0.2940334757):0.0896128651,ZOONE:0.1238414304):0.0517699561):0.0181870257); This work is completed!
Thank you again. sly
That's awesome! Glad to hear that you resolved the issue.
Best regards, Sina
Just one thing, you may need to re-root the tree based on the outgroup species.
When running read2tree with a small input test, I'm getting a KeyError. This follows SSLError on each "Loading OGs", so I am curious whether it is related, or a fully separate error.
Is this an error likely caused by my input (if so, how do I handle it), or is this an issue with read2tree?
The error: