Kinggerm / GetOrganelle

Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)
GNU General Public License v3.0
261 stars 50 forks source link

animal_mt: No valid Assembly graph found: an example of --reduce-reads-for-coverage #116

Open Nirmal2310 opened 2 years ago

Nirmal2310 commented 2 years ago

Hi @Kinggerm, I am trying to assemble the mitochondrial genome using get organelle tool but I am keep getting this error of No valid assembly graph found. I looked in the issues section and implemented everything like giving a seed input and not including any parenthesis in the directory path but still the result is same. Please help me out when convenient to yourself. I am attaching the log files of spade and getorganelle and I am also providing the command I used: get_organelle_from_reads.py -1 ../ERR194146.R1.fastq -2 ../ERR194146.R2.fastq -o ERR194146 -t 16 -F animal_mt -s /nfs_master/nirmal/raw/GRCh38.primary_assembly.chrM.fa Where GRCh38.primary_assembly.chrM.fa is the current human mitochondrial reference genome. Please take a look at this when convenient to yourself. get_org.log.txt slim.log.txt slim.log.txt

Kinggerm commented 2 years ago

Please try '--reduce-reads-for-coverage inf --max-reads inf' first, it could be caused by wrong estimation of the target depth.

Nirmal2310 commented 2 years ago

Hi @Kinggerm, thank you for your quick response. Actually I tried this method you mentioned above but the tool is taking approximately 3 to 4 days to complete. Can you suggest some optimum values for these parameters so that the assembly can finish as early as possible. Thank you in advance.

Kinggerm commented 2 years ago

Optimal values cannot be given prior to a successful run, otherwise it will be incorporated into the software. Does the 3-4 day running finish with good results? If so, attach the log file so that I can see if there is room to fine-tune.

Nirmal2310 commented 2 years ago

Hi @Kinggerm, Sorry for this late response because of some health issue I couldn't complete the task. I finally did what you asked for and I am attaching the command as well as the log file. Please go through it and suggest whether there is some room to fine-tune. Command I used: get_organelle_from_reads.py -1 ../../ERR194146.R1.fastq -2 ../../ERR194146.R2.fastq -o test -t 64 -F animal_mt --reduce-reads-for-coverage inf --max-reads inf log file: get_org.log.txt Time taken: 80 hours 23 minutes 33 seconds. Threads Used: 64 RAM: 1 TB OS: CentOS Linux 7 (Core).

Please ask if you need any more details.

Thank you so much for your help and time.

Kinggerm commented 2 years ago

Thanks for getting back. Hope you are doing well. As we can see from the --reduce-reads-for-coverage inf --max-reads inf log file, the pre-assembly coverage estimation of 12286 is quite close to post-assembly result of 12154, which is a great thing, but weird to me that the default parameter cannot work though.

Anyway, try using --reduce-reads-for-coverage 2000 or a similar value should greatly reduce the computational burden.

Nirmal2310 commented 2 years ago

Hi @Kinggerm , thank you so much for such a fast response. I am really glad for that and I will try this approach and get back to you regarding this. The only thing I wanted to confirm is to whether use --max-reads inf parameter or not. Thank you so much.

Kinggerm commented 2 years ago

Hi @Kinggerm , thank you so much for such a fast response. I am really glad for that and I will try this approach and get back to you regarding this. The only thing I wanted to confirm is to whether use --max-reads inf parameter or not. Thank you so much.

--max-reads and --reduce-reads-for-coverage both work to limit the amount of reads, whichever is smaller. Given that coverage estimation works for your taxa and that --reduce-reads-for-coverage will take a strong limit, there is no need to set a value for --max-reads.

Nirmal2310 commented 2 years ago

Okay, thank you so much for your response. I will try this approach and get back to you. Thank you so much once again.

9326xiaoxiao commented 1 year ago

Take the liberty to borrow the post owner's place. I want to ask where my error occurred. It always occurs No valid Assembly graph found.@Kinggerm First, the code I used:get_organelle_from_reads.py -1 /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_1.clean.fq.gz -2 /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_2.clean.fq.gz -o test5 -R 32 -t 64 -F animal_mt --reduce-reads-for-coverage inf --max-reads inf log file: get_org.log.txt spades.log Please take a look at this when convenient to yourself.

JianjunJin commented 1 year ago

@9326xiaoxiao Your issue is different. Please check #198

9326xiaoxiao commented 1 year ago

Hello, teacher. Thank you for your quick reply. I sent the text5 file again. Do you have time to explain the environmental issues in detail? thank you!

At 2022-09-29 10:30:00, "JianjunJin" @.***> wrote:

@9326xiaoxiao Your issue is different. Please check #198

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

GetOrganelle v1.7.6.1

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] PLATFORM: Linux antias-Precision-7920-Tower 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.6.1; numpy 1.23.3; sympy 1.10.1; scipy 1.9.1 DEPENDENCIES: Bowtie2 2.4.5; SPAdes 3.13.0; Blast 2.5.0 GETORG_PATH=/home/mxx/.GetOrganelle SEED DB: animal_mt 0.0.0 LABEL DB: animal_mt 0.0.1 WORKING DIR: /home/mxx /home/mxx/anaconda3/envs/get/bin/get_organelle_from_reads.py -1 /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_1.clean.fq.gz -2 /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_2.clean.fq.gz -o test5 -R 32 -t 64 -F animal_mt --reduce-reads-for-coverage inf --max-reads inf

2022-09-28 23:29:11,754 - INFO: Pre-reading fastq ... 2022-09-28 23:29:11,755 - INFO: Unzipping reads file: /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_1.clean.fq.gz (2179665805 bytes) 2022-09-28 23:30:09,808 - INFO: Unzipping reads file: /home/mxx/anaconda3/envs/get/BR1_FDMS210380612-1a_2.clean.fq.gz (2223488277 bytes) 2022-09-28 23:31:08,054 - INFO: Counting read qualities ... 2022-09-28 23:31:08,199 - INFO: Identified quality encoding format = Sanger 2022-09-28 23:31:08,199 - INFO: Phred offset = 33 2022-09-28 23:31:08,200 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-09-28 23:31:08,234 - INFO: Mean error rate = 0.0026 2022-09-28 23:31:08,235 - INFO: Counting read lengths ... 2022-09-28 23:32:03,884 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-09-28 23:32:03,884 - INFO: Reads used = 32254588+32254588 2022-09-28 23:32:03,884 - INFO: Pre-reading fastq finished.

2022-09-28 23:32:03,884 - INFO: Making seed reads ... 2022-09-28 23:32:03,885 - INFO: Seed bowtie2 index existed! 2022-09-28 23:32:03,885 - INFO: Mapping reads to seed bowtie2 index ... 2022-09-28 23:48:20,109 - INFO: Mapping finished. 2022-09-28 23:48:20,110 - INFO: Seed reads made: test5/seed/animal_mt.initial.fq (3252919 bytes) 2022-09-28 23:48:20,112 - INFO: Making seed reads finished.

2022-09-28 23:48:20,112 - INFO: Checking seed reads and parameters ... 2022-09-28 23:48:20,112 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2022-09-28 23:48:20,113 - INFO: If the result graph is not a circular organelle genome, 2022-09-28 23:48:20,113 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2022-09-28 23:48:22,334 - INFO: Pre-assembling mapped reads ... 2022-09-28 23:48:22,835 - INFO: Retrying with more reads .. 2022-09-29 00:02:12,944 - WARNING: Pre-assembling failed. The estimations for animal_mt-hitting base-coverage and word size may be misleading. 2022-09-29 00:02:18,285 - INFO: Estimated animal_mt-hitting base-coverage = 305.68 2022-09-29 00:02:18,554 - INFO: Estimated word size(s): 119 2022-09-29 00:02:18,554 - INFO: Setting '-w 119' 2022-09-29 00:02:18,554 - INFO: Setting '--max-extending-len inf' 2022-09-29 00:02:18,691 - INFO: Checking seed reads and parameters finished.

2022-09-29 00:02:18,691 - INFO: Making read index ... 2022-09-29 00:08:03,175 - INFO: 53810080 candidates in all 64509176 reads 2022-09-29 00:08:03,175 - INFO: Pre-grouping reads ... 2022-09-29 00:08:03,175 - INFO: Setting '--pre-w 119' 2022-09-29 00:08:08,081 - INFO: 200000/7054047 used/duplicated 2022-09-29 00:08:16,987 - INFO: 6035 groups made. 2022-09-29 00:08:22,883 - INFO: Making read index finished.

2022-09-29 00:08:22,883 - INFO: Extending ... 2022-09-29 00:08:22,883 - INFO: Adding initial words ... 2022-09-29 00:08:23,013 - INFO: AW 69574 2022-09-29 00:11:10,210 - INFO: Round 1: 53810080/53810080 AI 55903 AW 313224 2022-09-29 00:14:01,165 - INFO: Round 2: 53810080/53810080 AI 74812 AW 427258 2022-09-29 00:16:53,256 - INFO: Round 3: 53810080/53810080 AI 83991 AW 485830 2022-09-29 00:19:46,768 - INFO: Round 4: 53810080/53810080 AI 94188 AW 526156 2022-09-29 00:22:40,683 - INFO: Round 5: 53810080/53810080 AI 94421 AW 530994 2022-09-29 00:25:34,244 - INFO: Round 6: 53810080/53810080 AI 94462 AW 531634 2022-09-29 00:28:27,846 - INFO: Round 7: 53810080/53810080 AI 94489 AW 532002 2022-09-29 00:31:22,043 - INFO: Round 8: 53810080/53810080 AI 94516 AW 532390 2022-09-29 00:34:15,941 - INFO: Round 9: 53810080/53810080 AI 94542 AW 532744 2022-09-29 00:37:09,484 - INFO: Round 10: 53810080/53810080 AI 94559 AW 532878 2022-09-29 00:40:03,430 - INFO: Round 11: 53810080/53810080 AI 94568 AW 532964 2022-09-29 00:42:57,934 - INFO: Round 12: 53810080/53810080 AI 94583 AW 533166 2022-09-29 00:45:51,501 - INFO: Round 13: 53810080/53810080 AI 94589 AW 533242 2022-09-29 00:48:45,230 - INFO: Round 14: 53810080/53810080 AI 94594 AW 533322 2022-09-29 00:51:38,950 - INFO: Round 15: 53810080/53810080 AI 94602 AW 533454 2022-09-29 00:54:32,591 - INFO: Round 16: 53810080/53810080 AI 94622 AW 533656 2022-09-29 00:57:26,444 - INFO: Round 17: 53810080/53810080 AI 94643 AW 533866 2022-09-29 01:00:20,326 - INFO: Round 18: 53810080/53810080 AI 94653 AW 533926 2022-09-29 01:03:14,246 - INFO: Round 19: 53810080/53810080 AI 94661 AW 534056 2022-09-29 01:06:08,152 - INFO: Round 20: 53810080/53810080 AI 94685 AW 534364 2022-09-29 01:09:01,830 - INFO: Round 21: 53810080/53810080 AI 94713 AW 534544 2022-09-29 01:11:55,556 - INFO: Round 22: 53810080/53810080 AI 94722 AW 534658 2022-09-29 01:14:50,300 - INFO: Round 23: 53810080/53810080 AI 94739 AW 534898 2022-09-29 01:17:44,133 - INFO: Round 24: 53810080/53810080 AI 94746 AW 534986 2022-09-29 01:20:37,931 - INFO: Round 25: 53810080/53810080 AI 94747 AW 535014 2022-09-29 01:23:31,789 - INFO: Round 26: 53810080/53810080 AI 94749 AW 535052 2022-09-29 01:26:25,502 - INFO: Round 27: 53810080/53810080 AI 94752 AW 535102 2022-09-29 01:29:19,592 - INFO: Round 28: 53810080/53810080 AI 94762 AW 535220 2022-09-29 01:32:13,464 - INFO: Round 29: 53810080/53810080 AI 94766 AW 535228 2022-09-29 01:35:07,326 - INFO: Round 30: 53810080/53810080 AI 94773 AW 535346 2022-09-29 01:38:01,282 - INFO: Round 31: 53810080/53810080 AI 94779 AW 535378 2022-09-29 01:40:55,142 - INFO: Round 32: 53810080/53810080 AI 94779 AW 535378 2022-09-29 01:40:55,143 - INFO: No more reads found and terminated ... 2022-09-29 01:41:54,458 - INFO: Extending finished.

2022-09-29 01:41:56,501 - INFO: Separating extended fastq file ... 2022-09-29 01:41:56,936 - INFO: Setting '-k 21,55,85,115' 2022-09-29 01:41:56,936 - INFO: Assembling using SPAdes ... 2022-09-29 01:41:56,941 - INFO: spades.py -t 64 --phred-offset 33 -1 test5/extended_1_paired.fq -2 test5/extended_2_paired.fq --s1 test5/extended_1_unpaired.fq --s2 test5/extended_2_unpaired.fq -k 21,55,85,115 -o test5/extended_spades 2022-09-29 01:41:57,143 - WARNING: Assembling exited halfway.

2022-09-29 01:41:57,222 - ERROR: No valid assembly graph found!

Total cost 7966.74 s Thank you!

Command line: /home/mxx/anaconda3/envs/get/bin/spades.py -t 64 --phred-offset 33 -1 /home/mxx/test5/extended_1_paired.fq -2 /home/mxx/test5/extended_2_paired.fq --s1 /home/mxx/test5/extended_1_unpaired.fq --s2 /home/mxx/test5/extended_2_unpaired.fq -k 21,55,85,115 -o /home/mxx/test5/extended_spades

System information: SPAdes version: 3.13.0 Python version: 3.10.6 OS: Linux-5.4.0-124-generic-x86_64-with-glibc2.31

Output dir: /home/mxx/test5/extended_spades Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology or --meta flag if processing metagenomic dataset) Reads:

luozhisen commented 5 months ago

Hi, I also have the same problom, the code and error info like this, and I try all methods above, but it does not work I think. (luozhisen) starv2-PowerEdge-R7525:MuSW004A $ get_organelle_from_reads.py -1 MuSW004A_1_clean.fq.gz -2 MuSW004A_2_clean.fq.gz -R 10 -F animal_mt -t 4 -o animal_mt_out --reduce-reads-for-coverage inf --max-reads inf

GetOrganelle v1.7.7.0

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] PLATFORM: Linux starv2-PowerEdge-R7525 6.5.0-21-generic #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1 DEPENDENCIES: Bowtie2 2.4.1; SPAdes 3.13.1; Blast 2.14.1 GETORG_PATH=/home/data/t040503/.GetOrganelle SEED DB: animal_mt 0.0.1 LABEL DB: animal_mt 0.0.1 WORKING DIR: /home/data/t040503/lzs/MuSW004A /home/data/t040503/miniconda3/envs/luozhisen/bin/get_organelle_from_reads.py -1 MuSW004A_1_clean.fq.gz -2 MuSW004A_2_clean.fq.gz -R 10 -F animal_mt -t 4 -o animal_mt_out --reduce-reads-for-coverage inf --max-reads inf

2024-04-04 21:11:10,945 - INFO: Pre-reading fastq ... 2024-04-04 21:11:10,946 - INFO: Unzipping reads file: MuSW004A_1_clean.fq.gz (6782072184 bytes) 2024-04-04 21:13:50,907 - INFO: Unzipping reads file: MuSW004A_2_clean.fq.gz (6807446106 bytes) 2024-04-04 21:16:30,538 - INFO: Counting read qualities ... 2024-04-04 21:16:30,673 - INFO: Identified quality encoding format = Sanger 2024-04-04 21:16:30,673 - INFO: Phred offset = 33 2024-04-04 21:16:30,674 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2024-04-04 21:16:30,745 - INFO: Mean error rate = 0.0011 2024-04-04 21:16:30,746 - INFO: Counting read lengths ... 2024-04-04 21:18:32,035 - INFO: Mean = 148.8 bp, maximum = 150 bp. 2024-04-04 21:18:32,036 - INFO: Reads used = 61902571+61902571 2024-04-04 21:18:32,036 - INFO: Pre-reading fastq finished.

2024-04-04 21:18:32,036 - INFO: Making seed reads ... 2024-04-04 21:18:32,036 - INFO: Seed bowtie2 index existed! 2024-04-04 21:18:32,036 - INFO: Mapping reads to seed bowtie2 index ... 2024-04-04 21:31:03,557 - INFO: Mapping finished. 2024-04-04 21:31:03,557 - INFO: Seed reads made: animal_mt_out/seed/animal_mt.initial.fq (4590030 bytes) 2024-04-04 21:31:03,560 - INFO: Making seed reads finished.

2024-04-04 21:31:03,560 - INFO: Checking seed reads and parameters ... 2024-04-04 21:31:03,560 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2024-04-04 21:31:03,560 - INFO: If the result graph is not a circular organelle genome, 2024-04-04 21:31:03,560 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2024-04-04 21:31:05,941 - INFO: Pre-assembling mapped reads ... 2024-04-04 21:31:06,573 - INFO: Retrying with more reads .. 2024-04-04 22:03:43,341 - WARNING: Pre-assembling failed. The estimations for animal_mt-hitting base-coverage and word size may be misleading. 2024-04-04 22:03:51,172 - INFO: Estimated animal_mt-hitting base-coverage = 455.92 2024-04-04 22:03:51,423 - INFO: Estimated word size(s): 119 2024-04-04 22:03:51,424 - INFO: Setting '-w 119' 2024-04-04 22:03:51,424 - INFO: Setting '--max-extending-len inf' 2024-04-04 22:03:51,522 - INFO: Checking seed reads and parameters finished.

2024-04-04 22:03:51,523 - INFO: Making read index ... 2024-04-04 22:16:39,226 - INFO: 120506731 candidates in all 123805142 reads 2024-04-04 22:16:39,226 - INFO: Pre-grouping reads ... 2024-04-04 22:16:39,226 - INFO: Setting '--pre-w 119' 2024-04-04 22:16:46,842 - INFO: 200000/1006462 used/duplicated 2024-04-04 22:17:05,412 - INFO: 3060 groups made. 2024-04-04 22:17:15,652 - INFO: Making read index finished.

2024-04-04 22:17:15,655 - INFO: Extending ... 2024-04-04 22:17:15,655 - INFO: Adding initial words ... 2024-04-04 22:17:15,915 - INFO: AW 117098 2024-04-04 22:24:50,128 - INFO: Round 1: 120506731/120506731 AI 263086 AW 760158 2024-04-04 22:33:02,138 - INFO: Round 2: 120506731/120506731 AI 269873 AW 880108 2024-04-04 22:41:36,774 - INFO: Round 3: 120506731/120506731 AI 370492 AW 1514412 2024-04-04 22:51:28,060 - INFO: Round 4: 120506731/120506731 AI 580176 AW 2483826 2024-04-04 23:00:40,465 - INFO: Round 5: 120506731/120506731 AI 640710 AW 2941538 2024-04-04 23:09:39,029 - INFO: Round 6: 120506731/120506731 AI 720053 AW 3420160 2024-04-04 23:18:55,119 - INFO: Round 7: 120506731/120506731 AI 744047 AW 3633410 2024-04-04 23:28:09,184 - INFO: Round 8: 120506731/120506731 AI 755260 AW 3730806 2024-04-04 23:37:21,176 - INFO: Round 9: 120506731/120506731 AI 760490 AW 3786126 2024-04-04 23:47:05,811 - INFO: Round 10: 120506731/120506731 AI 765893 AW 3840872 2024-04-04 23:47:05,811 - INFO: Hit the round limit 10 and terminated ... 2024-04-04 23:49:26,862 - INFO: Extending finished.

2024-04-04 23:49:41,651 - INFO: Separating extended fastq file ... 2024-04-04 23:49:45,306 - INFO: Setting '-k 21,55,85,115' 2024-04-04 23:49:45,307 - INFO: Assembling using SPAdes ... 2024-04-04 23:49:45,366 - INFO: spades.py -t 4 --phred-offset 33 -1 animal_mt_out/extended_1_paired.fq -2 animal_mt_out/extended_2_paired.fq --s1 animal_mt_out/extended_1_unpaired.fq --s2 animal_mt_out/extended_2_unpaired.fq -k 21,55,85,115 -o animal_mt_out/extended_spades 2024-04-04 23:49:46,358 - WARNING: Assembling exited halfway.

2024-04-04 23:49:46,888 - ERROR: No valid assembly graph found! 2024-04-04 23:49:46,889 - WARNING: This might due to a damaged dependency, to unreasonable seed/parameter choices, or to a bug. 2024-04-04 23:49:46,889 - INFO: Please first search similar issues at https://github.com/Kinggerm/GetOrganelle/issues, then leave your message following the same issue, or open an issue at https://github.com/Kinggerm/GetOrganelle/issues if it is new, Please always attach the get_org.log.txt file.

Total cost 9517.53 s Thank you! Can you see why?