Open sci-study opened 1 year ago
For additional information, an example of a protein sequence within an OMA group and its corresponding CDS (located in a single file containing all CDS).
CAG2184331.1 unnamed protein product, partial [oppiella_nova_GCA_905397405] CEKCDGKCVICDSYVRPSTLVRICDECNYGSYQGRCVICGGPGVSDAYYCKECTIQEKDRDGCPKIVNLGSSKTDLFYER KKYGFKKR
CAG2184331.1 unnamed protein product, partial [oppiella_nova_GCA_905397405] TGCGAGAAGTGCGACGGGAAGTGCGTTATCTGCGACTCCTATGTCCGGCCCTCGACTTTGGTCCGCATCTGCGATGAGTGCAACTATGGCTCATATCAGGGCCGGTGTGTCATCTGCGGTGGTCCCGGGGTTAGTGACGCCTACTATTGCAAGGAGTGTACGATTCAGGAGAAGGACAGGGATGGCTGTCCCAAGATTGTCAACTTGGGCTCCAGTAAAACGGATCTCTTTTATGAGCGCAAGAAGTATGGCTTCAAAAAGAGGTGA
Apologies for commenting so much on my own post.
It appears the issue was similar to https://github.com/DessimozLab/read2tree/issues/20 where manual deletion of all underscores "_" fixed the issue.
Program is currently running, will update when complete.
I've subsetted 69 (selected as they include sequences from all genomes of interest) OMA groups composed from 22 genomes using the OMA standalone package. I've also made a fasta file with the corresponding CDS sequences whilst using the same headers found in the OMA groups. However, I'm encountering issues that I'm finding hard to overcome.
i.e formatting examples (Marker gene) Protein 1 [Animal 1] DVAEKCRVL Protein 1 [Animal 2] DVAEKCRVL
(Corresponding CDS file) Protein 1 [Animal 1] ATCGATCGATCG Protein 1 [Animal 2] ATCGATCGATCG
However, when I start the Read2Tree program with the below command (All files and folders (test_markers) are in directory in which I run read2tree).
read2tree --reference --standalone ./test_markers --output_path output_v1 --dna_reference total_orths_cds.fa
I get the error:
--- Load OGs with min 0 species from oma test_markers - mode = marker_genes --- Loading files for pre-filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 69/69 [00:00<00:00, 2053.57 OGs/s] 2023-07-12 15:42:14,120 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from total_orths_cds.fa --- 2023-07-12 15:42:14,121 - read2tree.OGSet - INFO - Loading total_orths_cds.fa into memory. This might take a while . . . Loading OGs: 0%| | 0/69 [00:00<?, ? OGs/s]
Loading OGs: 0%| | 0/69 [06:01<?, ? OGs/s] Traceback (most recent call last): File "/home/youseuf/miniconda3/envs/read2tree2/bin/read2tree", line 4, in
import('pkg_resources').run_script('read2tree==0.1.4', 'read2tree')
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/pkg_resources/init.py", line 720, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/pkg_resources/init.py", line 1570, in run_script
exec(script_code, namespace, namespace)
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/EGG-INFO/scripts/read2tree", line 16, in
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/main.py", line 289, in main
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 79, in init
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 192, in _load_ogs
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 337, in _check_dna_aa_length_consistency
File "/home/youseuf/miniconda3/envs/read2tree2/lib/python3.8/site-packages/read2tree-0.1.4-py3.8.egg/read2tree/OGSet.py", line 337, in
AttributeError: 'NoneType' object has no attribute 'id'
when I look into the mplog.log file i see:
2023-07-12 15:42:14,120 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from total_orths_cds.fa --- 2023-07-12 15:42:14,121 - read2tree.OGSet - INFO - Loading total_orths_cds.fa into memory. This might take a while . . . 2023-07-12 15:42:14,146 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80 2023-07-12 15:42:14,200 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162 2023-07-12 15:42:14,202 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443 2023-07-12 15:43:14,326 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160 2023-07-12 15:43:14,329 - read2tree.OGSet - DEBUG - DNA not found for XP_046914939.1_OG24421. 2023-07-12 15:43:14,331 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80 2023-07-12 15:43:14,384 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162 2023-07-12 15:43:14,387 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443 2023-07-12 15:44:14,524 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160 2023-07-12 15:44:14,526 - read2tree.OGSet - DEBUG - DNA not found for XP_027206261.1_OG24421. 2023-07-12 15:44:14,529 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80 2023-07-12 15:44:14,583 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162 2023-07-12 15:44:14,586 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443 2023-07-12 15:45:14,724 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160 2023-07-12 15:45:14,727 - read2tree.OGSet - DEBUG - DNA not found for XP_029824739.1_OG24421. 2023-07-12 15:45:14,935 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80 2023-07-12 15:45:14,988 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162 2023-07-12 15:45:14,991 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443 2023-07-12 15:46:15,132 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160 2023-07-12 15:46:15,135 - read2tree.OGSet - DEBUG - DNA not found for XP_054162837.1_OG24421. 2023-07-12 15:46:15,137 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): omabrowser.org:80 2023-07-12 15:46:15,190 - urllib3.connectionpool - DEBUG - http://omabrowser.org:80 "GET /api/protein/XP/ HTTP/1.1" 301 162 2023-07-12 15:46:15,193 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): omabrowser.org:443 2023-07-12 15:47:15,314 - urllib3.connectionpool - DEBUG - https://omabrowser.org:443 "GET /api/protein/XP/ HTTP/1.1" 504 160 2023-07-12 15:47:15,317 - read2tree.OGSet - DEBUG - DNA not found for XP_053212400.1_OG24421.
Any help would be extremely appreciated.