Closed YuWang07 closed 10 months ago
Hello
Thanks for using read2tree. It seems that the fromatting is not correct. I guess the record id of both should be indentical, but I need to double check. Could you please send us a few fasta files of gene markers and their cds?
Best, Sina
On Tue, 26 Dec 2023, 15:29 YuWang, @.***> wrote:
Dear team: Thanks for developing this tool which is really helpful. I am trying to use it for a customer marker gene sets and I reformat the protein sequence and cds sequence as suggested here: [#22 https://github.com/DessimozLab/read2tree/issues/22]
Here is an example of the ids of the markers: marker protein: >snapmasked-CP113692.1-processed-gene-8.84-mRNA-1 | [GCA027563075.1] cds ref: >snapmasked-CP113692.1-processed-gene-8.84-mRNA-1
I also remove all the underscore "_" as previously someone asked, but i still keep getting this error:
Creates the reference folder --- Load OGs with min 0 species from oma /scratch/project/wwvirus/RO3/02_orthogroup/04.marker_reformat/02_sub14/02.AA - mode = marker_genes --- ^MLoading files for pre-filter: 0%| | 0/32 [00:00<?, ? OGs/s]^MLoading files for pre-filter: 100%|██████████| 32/32 [00:00<00:00, 984.64 OGs/s] 2023-12-26 21:55:26,900 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from /scratch/project/wwvirus/RO3/02_orthogroup/04.marker_reformat/02_sub14/01.CDS/dna_ref.fa --- 2023-12-26 21:55:26,900 - read2tree.OGSet - INFO - Loading dna_ref.fa into memory. This might take a while . . . ^MLoading OGs: 0%| | 0/32 [00:00<?, ? OGs/s]^MLoading OGs: 0%| | 0/32 [00:01<?, ? OGs/s] Traceback (most recent call last): File "/home/s4519890/miniconda3/envs/r2t/bin/read2tree", line 16, in
main(sys.argv[1:], exe_name=exe_name(), desc=desc) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 289, in main ogset = OGSet(args, oma_output=oma_output, progress=progress) # Generate the OGs with their DNA sequences File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 79, in init self.ogs = self._load_ogs() File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 186, in _load_ogs ogs[name].dna = self._get_dna_records(ogs[name].aa, File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 361, in _get_dna_records og_cdna.append(self._get_dna_from_fasta(record, db)) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 322, in _get_dna_from_fasta return self._get_dna_from_REST(record) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/OGSet.py", line 282, in _get_dna_from_REST seq = oma_record.json()['cdna'] KeyError: 'cdna' Could you please help on this issue? Any idea about why this happen?
Thanks you very much.
— Reply to this email directly, view it on GitHub https://github.com/DessimozLab/read2tree/issues/54, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ7LXTBHPWUB4LLQUVLKVILYLK3ZHAVCNFSM6AAAAABBDGWG7SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TMMZUGQ2TEMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Sina
Thanks for the quick reply. The id should be identical in my maker and cds file. I have attached my maker gene set. Here is the protein fasta file with >id | [species_name]: 02.AA.zip the reference cds fasta file with >id: dna_ref.fa.zip the reference cds fasta file with >id | [species_name]: ref.format.cds.fa.zip
I try both of them but get the same error when creating the reference folder:
read2tree --standalone_path --standalone_path 02.AA/ --output_path ${OUT_DIR} --reference --dna_reference dna_ref.fa
Thanks. Yu
Dear Yu The ID of fasta records should be identical for both aa and dna. I can see some slight differences
$ grep CP113692 02.AA/*
02.AA/OG0012664.fa:>snapmasked-CP113692.1-processed-gene-8.84-mRNA-1 | [GCA027563075.1]
02.AA/OG0012753.fa:>snapmasked-CP113692.1-processed-gene-8.66-mRNA-1 | [GCA027563075.1]
$ grep CP113692 dna_ref.fa
>snapmaskedCP113692.1processedgene8.84mRNA1 | [GCA027563075.1]
>snapmaskedCP113692.1processedgene8.66mRNA1 | [GCA027563075.1]
$ grep CP113692 ref.format.cds.fa
>snap_masked-CP113692.1-processed-gene-8.84-mRNA-1
>snap_masked-CP113692.1-processed-gene-8.66-mRNA-1
$ grep "snapmasked-CP113692.1-processed-gene-8.84-mRNA-1" ref.format.cds.fa
$
Please send me a new version, so I can check whether adding [species]
is enough or not.
Dear Yu,
it seems that now read2tree does ignore the --dna_reference argument (in log: Load logs … from REST-API). I think it checks that the file name contains a .fa or .fasta extension.
so by renaming the dna.cds.da -> dna.cds.fa it should work I assume.
Best Adrian
On 9 Jan 2024, at 05:54, YuWang @.***> wrote:
Hi, Sina
Thank you for the help.
I simplify the gene name for the marker gene set. But when I try to build the reference. Here is some example of the input files:
$ grep ">" dna.cds.da | head
OSNDL11138-RA | [GCA_008403515.1] RNTFV01679-RA | [GCA_014843625.1] NBQJT01402-RA | [GCA_027563095.1] CDDWM01290-RA | [GCA_027563055.1] TNUXI01279-RA | [GCA_027563075.1] TRFAQ09314-RA | [GCA_902703645.1] WBVKE10572-RA | [GCA_000499105.1] LJCVG04613-RA | [GCA_020891285.1] OSNDL10718-RA | [GCA_008403515.1] RNTFV00574-RA | [GCA_014843625.1]
$ grep ">" OG0012668.fa
OSNDL11138-RA | [GCA_008403515.1] RNTFV01679-RA | [GCA_014843625.1] NBQJT01402-RA | [GCA_027563095.1] CDDWM01290-RA | [GCA_027563055.1] TNUXI01279-RA | [GCA_027563075.1] TRFAQ09314-RA | [GCA_902703645.1] WBVKE10572-RA | [GCA_000499105.1] LJCVG04613-RA | [GCA_020891285.1] I get the following error:
read2tree --standalone_path ./00.AA/ --output_path /scratch/project/wwvirus/RO3_24_reanalysis/02_R2T_OUT --reference --dna_reference ./dna.cds.da
--- Load OGs with min 0 species from oma 00.AA - mode = marker_genes --- Loading files for pre-filter: 100%|███████████████████████████████| 32/32 [00:00<00:00, 348.23 OGs/s] 2024-01-09 14:36:46,716 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq using the REST api --- Loading OGs: 0%| | 0/32 [00:00<?, ? OGs/s]2024-01-09 14:36:49,640 - read2tree.OGSet - WARNING - This OG OG0012773 did not have any DNA Loading OGs: 3%|█▌ | 1/32 [00:02<01:30, 2.92s/ OGs]2024-01-09 14:36:51,941 - read2tree.OGSet - WARNING - This OG OG0012825 did not have any DNA Loading OGs: 6%|███▏ | 2/32 [00:05<01:16, 2.56s/ OGs]2024-01-09 14:36:54,196 - read2tree.OGSet - WARNING - This OG OG0012882 did not have any DNA Loading OGs: 9%|████▋ | 3/32 [00:07<01:10, 2.42s/ OGs]2024-01-09 14:36:56,467 - read2tree.OGSet - WARNING - This OG OG0012811 did not have any DNA Loading OGs: 12%|██████▎ | 4/32 [00:09<01:06, 2.36s/ OGs]2024-01-09 14:36:58,766 - read2tree.OGSet - WARNING - This OG OG0012807 did not have any DNA Loading OGs: 16%|███████▊ | 5/32 [00:12<01:03, 2.34s/ OGs]2024-01-09 14:37:01,057 - read2tree.OGSet - WARNING - This OG OG0012748 did not have any DNA Loading OGs: 19%|█████████▍ | 6/32 [00:14<01:00, 2.32s/ OGs]2024-01-09 14:37:03,316 - read2tree.OGSet - WARNING - This OG OG0012756 did not have any DNA Loading OGs: 22%|██████████▉ | 7/32 [00:16<00:57, 2.30s/ OGs]2024-01-09 14:37:05,579 - read2tree.OGSet - WARNING - This OG OG0012760 did not have any DNA Loading OGs: 25%|████████████▌ | 8/32 [00:18<00:54, 2.29s/ OGs]2024-01-09 14:37:07,841 - read2tree.OGSet - WARNING - This OG OG0012812 did not have any DNA Loading OGs: 28%|██████████████ | 9/32 [00:21<00:52, 2.28s/ OGs]2024-01-09 14:37:10,144 - read2tree.OGSet - WARNING - This OG OG0012816 did not have any DNA Loading OGs: 31%|███████████████▎ | 10/32 [00:23<00:50, 2.29s/ OGs]2024-01-09 14:37:12,410 - read2tree.OGSet - WARNING - This OG OG0012668 did not have any DNA Loading OGs: 34%|████████████████▊ | 11/32 [00:25<00:47, 2.28s/ OGs]2024-01-09 14:37:14,680 - read2tree.OGSet - WARNING - This OG OG0012707 did not have any DNA Loading OGs: 38%|██████████████████▍ | 12/32 [00:27<00:45, 2.28s/ OGs]2024-01-09 14:37:16,960 - read2tree.OGSet - WARNING - This OG OG0012732 did not have any DNA Loading OGs: 41%|███████████████████▉ | 13/32 [00:30<00:43, 2.28s/ OGs]2024-01-09 14:37:19,245 - read2tree.OGSet - WARNING - This OG OG0012867 did not have any DNA Loading OGs: 44%|█████████████████████▍ | 14/32 [00:32<00:41, 2.28s/ OGs]2024-01-09 14:37:21,502 - read2tree.OGSet - WARNING - This OG OG0012716 did not have any DNA Loading OGs: 47%|██████████████████████▉ | 15/32 [00:34<00:38, 2.27s/ OGs]2024-01-09 14:37:23,800 - read2tree.OGSet - WARNING - This OG OG0012689 did not have any DNA Loading OGs: 50%|████████████████████████▌ | 16/32 [00:37<00:36, 2.28s/ OGs]2024-01-09 14:37:26,068 - read2tree.OGSet - WARNING - This OG OG0012755 did not have any DNA Loading OGs: 53%|██████████████████████████ | 17/32 [00:39<00:34, 2.28s/ OGs]2024-01-09 14:37:28,334 - read2tree.OGSet - WARNING - This OG OG0012841 did not have any DNA Loading OGs: 56%|███████████████████████████▌ | 18/32 [00:41<00:31, 2.27s/ OGs]2024-01-09 14:37:30,622 - read2tree.OGSet - WARNING - This OG OG0012737 did not have any DNA Loading OGs: 59%|█████████████████████████████ | 19/32 [00:43<00:29, 2.28s/ OGs]2024-01-09 14:37:32,899 - read2tree.OGSet - WARNING - This OG OG0012680 did not have any DNA Loading OGs: 62%|██████████████████████████████▋ | 20/32 [00:46<00:27, 2.28s/ OGs]2024-01-09 14:37:35,149 - read2tree.OGSet - WARNING - This OG OG0012686 did not have any DNA Loading OGs: 66%|████████████████████████████████▏ | 21/32 [00:48<00:24, 2.27s/ OGs]2024-01-09 14:37:37,419 - read2tree.OGSet - WARNING - This OG OG0012878 did not have any DNA Loading OGs: 69%|█████████████████████████████████▋ | 22/32 [00:50<00:22, 2.27s/ OGs]2024-01-09 14:37:39,688 - read2tree.OGSet - WARNING - This OG OG0012690 did not have any DNA Loading OGs: 72%|███████████████████████████████████▏ | 23/32 [00:52<00:20, 2.27s/ OGs]2024-01-09 14:37:41,943 - read2tree.OGSet - WARNING - This OG OG0012888 did not have any DNA Loading OGs: 75%|████████████████████████████████████▊ | 24/32 [00:55<00:18, 2.26s/ OGs]2024-01-09 14:37:44,185 - read2tree.OGSet - WARNING - This OG OG0012692 did not have any DNA Loading OGs: 78%|██████████████████████████████████████▎ | 25/32 [00:57<00:15, 2.26s/ OGs]2024-01-09 14:37:46,422 - read2tree.OGSet - WARNING - This OG OG0012712 did not have any DNA Loading OGs: 81%|███████████████████████████████████████▊ | 26/32 [00:59<00:13, 2.25s/ OGs]2024-01-09 14:37:48,674 - read2tree.OGSet - WARNING - This OG OG0012838 did not have any DNA Loading OGs: 84%|█████████████████████████████████████████▎ | 27/32 [01:01<00:11, 2.25s/ OGs]2024-01-09 14:37:50,941 - read2tree.OGSet - WARNING - This OG OG0012719 did not have any DNA Loading OGs: 88%|██████████████████████████████████████████▉ | 28/32 [01:04<00:09, 2.26s/ OGs]2024-01-09 14:37:53,212 - read2tree.OGSet - WARNING - This OG OG0012854 did not have any DNA Loading OGs: 91%|████████████████████████████████████████████▍ | 29/32 [01:06<00:06, 2.26s/ OGs]2024-01-09 14:37:55,480 - read2tree.OGSet - WARNING - This OG OG0012765 did not have any DNA Loading OGs: 94%|█████████████████████████████████████████████▉ | 30/32 [01:08<00:04, 2.26s/ OGs]2024-01-09 14:37:57,739 - read2tree.OGSet - WARNING - This OG OG0012718 did not have any DNA Loading OGs: 97%|███████████████████████████████████████████████▍ | 31/32 [01:11<00:02, 2.26s/ OGs]2024-01-09 14:38:00,008 - read2tree.OGSet - WARNING - This OG OG0012688 did not have any DNA Loading OGs: 100%|█████████████████████████████████████████████████| 32/32 [01:13<00:00, 2.29s/ OGs] 2024-01-09 14:38:00,012 - read2tree.OGSet - INFO - : Gathering of DNA seq for 32 OGs took 73.2949275970459. --- Generating reference for mapping --- Loading records: 100%|██████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 114422.62 record/s] 2024-01-09 14:38:00,013 - read2tree.ReferenceSet - INFO - : Extracted 8 reference species form 32 ogs took 0.0006504058837890625 --- Alignment of 32 OGs --- /home/s4519890/miniconda3/envs/r2t/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stdout = io.open(c2pread, 'rb', bufsize) /home/s4519890/miniconda3/envs/r2t/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used self.stderr = io.open(errread, 'rb', bufsize) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 292, in _align_worker align.dna = self._get_translated_alignment(codons, alignment) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 216, in _get_translated_alignment codon = codons[rec.id] KeyError: 'OSNDL' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/s4519890/miniconda3/envs/r2t/bin/read2tree", line 16, in
main(sys.argv[1:], exe_name=exe_name(), desc=desc) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/main.py", line 291, in main alignments = Aligner(args, ogset.ogs, load=True) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 51, in init self.alignments = self._align(og_set) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/site-packages/read2tree/Aligner.py", line 330, in _align res_align = p.map(self._align_worker, og_chunks) File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/s4519890/miniconda3/envs/r2t/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value KeyError: 'OSNDL' Could you please help on this issue? Best Regards, Yu.
— Reply to this email directly, view it on GitHub https://github.com/DessimozLab/read2tree/issues/54#issuecomment-1882412411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALAW5QZGPQEYQBRD4AGPODYNTER7AVCNFSM6AAAAABBDGWG7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGQYTENBRGE. You are receiving this because you are subscribed to this thread.
Hi Sian and Adrian,
Thank you for your assistance. I found the error is caused by the file extension and by correcting it to *fa, enabling the script to run smoothly. Regarding the aa and dna ID generation, simplify the ID generated by Maker will make things much easier, all related issues have been solved now.
Furthermore, successful testing with the customer marker set and raw sequencing confirms the flawless performance of 'read2tree'.
I deeply appreciate your development efforts and help. This matter can now be considered resolved as I'll be closing the issue.
Best Regards, Yu
glad to hear that the software is now working for you.
Dear team: Thanks for developing this tool which is really helpful. I am trying to use it for a customer marker gene sets and I reformat the protein sequence and cds sequence as suggested here: [#22]
Here is an example of the ids of the markers: marker protein: >snapmasked-CP113692.1-processed-gene-8.84-mRNA-1 | [GCA027563075.1] cds ref: >snapmasked-CP113692.1-processed-gene-8.84-mRNA-1
I also remove all the underscore "_" as previously someone asked, but i still keep getting this error:
Could you please help on this issue? Any idea about why this happen?
Thanks you very much.