DessimozLab / read2tree

a tool for inferring species tree from sequencing reads
MIT License
138 stars 18 forks source link

KeyError in Alignment #35

Closed kodingkoning closed 8 months ago

kodingkoning commented 12 months ago

When running read2tree with a small input test, I'm getting a KeyError. This follows SSLError on each "Loading OGs", so I am curious whether it is related, or a fully separate error.

Is this an error likely caused by my input (if so, how do I handle it), or is this an issue with read2tree?

The error:

Loading OGs:  98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍  | 197/200 [01:31<00:01,  2.14 OGs/s]2023-07-28 15:53:21,904 - read2tree.OGSet - WARNING - DNA not found probably for MIXOS00798_OG1135201. The reason is HTTPSConnectionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
Loading OGs:  99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 198/200 [01:32<00:00,  2.14 OGs/s]2023-07-28 15:53:22,367 - read2tree.OGSet - WARNING - DNA not found probably for MIXOS00271_OG1148015. The reason is HTTPSConnectionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
Loading OGs: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏| 199/200 [01:32<00:00,  2.15 OGs/s]2023-07-28 15:53:22,829 - read2tree.OGSet - WARNING - DNA not found probably for MIXOS01050_OG681627. The reason is HTTPSConnectionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
Loading OGs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [01:33<00:00,  2.14 OGs/s]
2023-07-28 15:53:22,831 - read2tree.OGSet - INFO - Coemansia_Mojavensis: Gathering of DNA seq for 200 OGs took 93.32299399375916.
--- Generating reference for mapping ---
Loading records: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 92375.38 record/s]
2023-07-28 15:53:22,835 - read2tree.ReferenceSet - INFO - Coemansia_Mojavensis: Extracted 4 reference species form 200 ogs took 0.003381967544555664
--- Alignment of 200 OGs ---
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 292, in _align_worker
    align.dna = self._get_translated_alignment(codons, alignment)
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 216, in _get_translated_alignment
    codon = codons[rec.id]
KeyError: 'MIXOS'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/erkonin/mambaforge/envs/r2t/bin/read2tree", line 4, in <module>
    __import__('pkg_resources').run_script('read2tree==0.1.5', 'read2tree')
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/pkg_resources/__init__.py", line 720, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1570, in run_script
    exec(script_code, namespace, namespace)
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/EGG-INFO/scripts/read2tree", line 16, in <module>
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/main.py", line 291, in main
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 51, in __init__
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/Aligner.py", line 330, in _align
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/erkonin/mambaforge/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
KeyError: 'MIXOS'
sinamajidian commented 12 months ago

Hi @kodingkoning Thanks for using read2tree.

I'm wondering whether you tested your installation with example dataset that is available in our github? It would be great if you could provide us with full command line that you ran. And could you possibly tell us how you obtained marker genes?

Best regards, Sina

sly724 commented 8 months ago

Hi @sinamajidian

I'm having the same trouble. In fact, I'm guessing that the reason for not being able to complete the comparison is that Cdna of OGs can't be downloaded from the OMA browser. Because I also got these warnings:

--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes --- 2023-11-06 14:36:45,808 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- 2023-11-06 14:37:22,040 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE22870_OG843400. The reason is HTTPSConnec tionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolution Error("<urllib3.connection.HTTPSConnection object at 0x2afa81bb81f0>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or se rvice not known)")) 2023-11-06 14:37:32,067 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE20480_OG1137105. The reason is HTTPSConne ctionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolutio nError("<urllib3.connection.HTTPSConnection object at 0x2afa8131fb80>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or s ervice not known)")) 2023-11-06 14:37:42,087 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE08776_OG961357. The reason is HTTPSConnec tionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolution Error("<urllib3.connection.HTTPSConnection object at 0x2afa81bb8e20>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or se rvice not known)")) 2023-11-06 14:37:52,108 - read2tree.OGSet - WARNING - DNA not found probably for CIMLE19700_OG1242666. The reason is HTTPSConne ctionPool(host='omabrowser.org', port=443): Max retries exceeded with url: /api/protein/bulk_retrieve/ (Caused by NameResolutio nError("<urllib3.connection.HTTPSConnection object at 0x2afa81bb9780>: Failed to resolve 'omabrowser.org' ([Errno -2] Name or s ervice not known)"))

How do I fix this?

Best wishes, Sly

sinamajidian commented 8 months ago

Dear Sly, thanks for contacting us.
I think this case is a bit different. I'm sorry yesterday we had some issue with our server. Read2tree code tries to open this link via API. It is working now. Could you give it another try?

You could also concatenate all the fna files in the in your marker_gene folder and use it with option --dna_reference. Then read2tree won't download them.

cat marker_gene/*.fna > ref_dna.fa
read2tree .. --dna_reference  ref_dna.fa

Please let us know the results.

sly724 commented 8 months ago

Hi @sinamajidian Thank you for your reply. I tried again with the test data and successfully obtained the results. However, when I used my own data for analysis, I received some new errors:

--- Load OGs with min 0 species from oma marker_genes - mode = marker_genes --- Loading files for pre-filter: 100%|███████████████████████████████████████████████████████████████| 2196/2196 [00:03<00:00, 614.43 OGs/s] 2023-11-08 09:22:44,673 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- Loading OGs: 100%|███████████████████████████████████████████████████████████████████████████████| 2196/2196 [1:02:48<00:00, 1.72s/ OGs] 2023-11-08 10:25:33,136 - read2tree.OGSet - INFO - 44: Gathering of DNA seq for 2196 OGs took 3768.461127281189. --- Generating reference for mapping --- Loading records: 100%|██████████████████████████████████████████████████████████████████████| 2196/2196 [00:00<00:00, 118696.01 record/s] 2023-11-08 10:25:33,156 - read2tree.ReferenceSet - INFO - 44: Extracted 6 reference species form 2196 ogs took 0.019290685653686523 --- Alignment of 2196 OGs --- /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) 2023-11-08 10:25:42,693 - read2tree.wrappers.aligners.mafft - WARNING - is MAFFT_BINARIES set correctly: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 287, in _align_worker alignment = mafft_wrapper() File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/wrappers/aligners/mafft.py", line 107, in call raise WrapperError('Mafft did not compute any alignments. StdErr: {}'.format(error)) read2tree.wrappers.WrapperError: Mafft did not compute any alignments. StdErr: outputhat23=16 treein = 0 compacttree = 0 Warning: Only 0 sequence found. minimumweight = 0.000010 autosubalignment = 0.000000 nthread = 0 randomseed = 0 blosum 62 / kimura 200 poffset = 0 niter = 16 sueff_global = 0.100000 nadd = 16 Warning: Only 0 sequence found.

Strategy: L-INS-i (Probably most accurate, very slow) Iterative refinement method (<16) with LOCAL pairwise alignment information

If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Share/user/sly20/micromamba/envs/r2t/bin/read2tree", line 16, in main(sys.argv[1:], exe_name=exe_name(), desc=desc) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 291, in main alignments = Aligner(args, ogset.ogs, load=True) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 51, in init self.alignments = self._align(og_set) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 330, in _align res_align = p.map(self._align_worker, og_chunks) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value read2tree.wrappers.WrapperError: Mafft did not compute any alignments. StdErr: outputhat23=16 treein = 0 compacttree = 0 Warning: Only 0 sequence found. minimumweight = 0.000010 autosubalignment = 0.000000 nthread = 0 randomseed = 0 blosum 62 / kimura 200 poffset = 0 niter = 16 sueff_global = 0.100000 nadd = 16 Warning: Only 0 sequence found.

Strategy: L-INS-i (Probably most accurate, very slow) Iterative refinement method (<16) with LOCAL pairwise alignment information

If unsure which option to use, try 'mafft --auto input > output'. For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct). It tends to insert more gaps into gap-rich regions than previous versions. To disable this change, add the --leavegappyregion option.

sinamajidian commented 8 months ago

Nice, we have some progress. Read2tree uses MAFFT to align the marker genes (before using the sequencing reads) and it seems that there is a problem there. I noticed that you have 6 reference species and 2196 OGs. We usually much fewer number of marker genes, when you download from oma browser you can select how many, 150-400 would be a good choice. I would suggest to re-download them by setting Maximum nr of markers.

Would be great to tell us the set of species that you used, so I can try to reproduce the error.

You could also check mplog.log log file to see for which OG the error happens. This is a sample log

2023-11-04 17:36:04,452 - read2tree.ReferenceSet - INFO - : Extracted 21 reference species form 200 ogs took 0.0019800662994384766
2023-11-04 17:36:04,666 - read2tree.Aligner - DEBUG - aligning OG OG1234111 with 18 proteins
2023-11-04 17:36:04,667 - read2tree.wrappers.abstract_cli - DEBUG - Running following command: /software/miniconda/envs/r2t_3.10.8b/bin/mafft --auto --maxiterate 1000 --amino /tmp/34917137/tmpc6q1p2fa
2023-11-04 17:36:06,463 - read2tree.Aligner - DEBUG - aligning OG OG1195247 with 17 proteins
2023-11-04 17:36:06,463 - read2tree.wrappers.abstract_cli - DEBUG - Running following command: /software/miniconda/envs/r2t_3.10.8b/bin/mafft --auto --maxiterate 1000 --amino /tmp/34917137/tmpjfvxvdur

then you could try running mafft separately for that specific OG

mafft --auto --maxiterate 1000 --amino  01_ref_ogs_aa/OG1135874.fa

Mafft error says that there are 0 sequences. Probably one of the OG file is empty. you could try

ls -alhtS marker_genes | tail -n 3

ls -alhtS 01_ref_ogs_aa/ | tail -n 3
ls -alhtS 01_ref_ogs_dna/ | tail -n 3

Anyway, the easiest way would be to use fewer number of OGs.

sly724 commented 8 months ago

Hi @sinamajidian

Thank you for your reply. When I adjusted the number of OGs to 200, in single-species mode, I got these results without error.

When I used the multi-species model for the analysis, no species tree file was generated. At the same time, I'm got some errors.

The command is as follows:

read2tree --standalone_path marker_genes/ --output_path output --reference read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Hnjt/Hnjt_1.clean.fq /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/44_Hnjt/Hnjt_2.clean.fq read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Ddtl/Ddtl_1.clean.fq.gz /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Ddtl/Ddtl_2.clean.fq.gz read2tree --standalone_path marker_genes/ --dna_reference ref_dna.fa --thread 30 --output_path output --reads /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Mhtl/Mhtl_1.clean.fq.gz /Share/user/sly20/X101SC22050937-Z03-J004/00.CleanData/Mhtl/Mhtl_2.clean.fq.gz read2tree --standalone_path marker_genes/ --output_path output/ --reference --merge_all_mappings --tree

error message:

Loading records: 0%| | 0/200 [00:00<?, ? record/s] Loading records: 100%|██████████| 200/200 [00:00<00:00, 47782.00 record/s] /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize)

Re-loading files: 0 OGs [00:00, ? OGs/s] Re-loading files: 200 OGs [00:00, 3559.64 OGs/s]

Re-loading references for mapping from folder: 0%| | 0/6 [00:00<?, ? species/s] Re-loading references for mapping from folder: 100%|██████████| 6/6 [00:00<00:00, 297.90 species/s]

Loading alignments : 0 Alignment [00:00, ? Alignment/s] Loading alignments : 172 Alignment [00:00, 1713.75 Alignment/s] Loading alignments : 200 Alignment [00:00, 1702.60 Alignment/s]

Mapping reads to species: 0%| | 0/6 [00:00<?, ? species/s]/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/DIACI_OGs.fa.bam' /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:206: RuntimeWarning: Degrees of freedom <= 0 for slice ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide arrmean = um.true_divide(arrmean, div, out=arrmean, /Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/numpy/core/_methods.py:198: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount)

Mapping reads to species: 17%|█▋ | 1/6 [58:28<4:52:22, 3508.58s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/ACYPI_OGs.fa.bam'

Mapping reads to species: 33%|███▎ | 2/6 [1:57:36<3:55:27, 3531.77s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/ZOONE_OGs.fa.bam'

Mapping reads to species: 50%|█████ | 3/6 [2:47:43<2:44:36, 3292.06s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/CIMLE_OGs.fa.bam'

Mapping reads to species: 67%|██████▋ | 4/6 [3:39:08<1:47:00, 3210.26s/ species][E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/PEDHC_OGs.fa.bam'

Mapping reads to species: 83%|████████▎ | 5/6 [4:34:46<54:16, 3256.24s/ species] [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_t1zngzz7/RHOPR_OGs.fa.bam'

Mapping reads to species: 100%|██████████| 6/6 [5:27:07<00:00, 3217.16s/ species] Mapping reads to species: 100%|██████████| 6/6 [5:27:07<00:00, 3271.22s/ species]

Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s] Adding mapped seq to alignments: 100%|██████████| 200/200 [00:00<00:00, 1186507.50 alignments/s]

Adding mapped seq to OG: 0%| | 0/200 [00:00<?, ? OGs/s] Adding mapped seq to OG: 100%|██████████| 200/200 [00:00<00:00, 1246449.93 OGs/s]

Adding mapped seq to OG: 0%| | 0/200 [00:00<?, ? OGs/s] Adding mapped seq to OG: 100%|██████████| 200/200 [00:00<00:00, 42694.46 OGs/s]

Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s] Adding mapped seq to alignments: 0%| | 0/200 [00:00<?, ? alignments/s] Traceback (most recent call last): File "/Share/user/sly20/micromamba/envs/r2t/bin/read2tree", line 16, in main(sys.argv[1:], exe_name=exe_name(), desc=desc) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 363, in main alignments.add_mapped_seq(ogset.mapped_ogs) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 168, in add_mapped_seq self.mapped_aligns[name_og].aa = self._add_mapseq_align(align_filt.aa, map_record_aa[0], ref_species, species_name) File "/Share/user/sly20/micromamba/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 106, in _add_mapseq_align new = [map_record[placement_dic[i]] if '-' not in v else '-' for i, v in enumerate(list(ref_rec.seq))] AttributeError: 'NoneType' object has no attribute 'seq'

What should I do to fix this error?

Best wishes sly

sinamajidian commented 8 months ago

Thanks Sly for the response. You mentioned that you "got these results without error." Was it the same species that the error happens?

Could you please share with us the full mplog.log file. It seems that you are re-writing the files. Would be best to start the analysis in a new folder. Otherwise, the results of alignment of two runs (200 gene markers and full one ) gets mixed.

sly724 commented 8 months ago

Hi @sinamajidian

Thank you for your reply. I've figured out the reason for the error. First, I didn't name the reads file exactly as you specified (species_R1.fastq/species_R2.fastq) and my file name is as follows:38.物种名_1.clean.fq.gz/38.物种名_2.clean.fq.gz;Second, my reads file names include Chinese fonts. I adjusted the reads file naming for all species and I succeeded.

--- Tree inference --- 2023-11-17 22:29:17,754 - read2tree.TreeInference - INFO - merge: Tree inference took 32.54426026344299. (39Ddtl_R1:0.0673121808,38Mhtl_R1:0.0878371145,(44Hnjt_R1:0.0881589205,((((CIMLE:0.1771613970,RHOPR:0.1652477443):0.1578374932,(ACYPI:0.3422935192,DIACI:0.3035898292):0.0628138094):0.0824901814,PEDHC:0.2940334757):0.0896128651,ZOONE:0.1238414304):0.0517699561):0.0181870257); This work is completed!

Thank you again. sly

sinamajidian commented 8 months ago

That's awesome! Glad to hear that you resolved the issue.

Best regards, Sina

sinamajidian commented 8 months ago

Just one thing, you may need to re-root the tree based on the outgroup species.