abishara / athena_meta

read cloud assembler
MIT License
34 stars 8 forks source link

Error: assembly failed to produce contig.fa #22

Closed nkalsi22 closed 5 years ago

nkalsi22 commented 5 years ago

Hi, I get the following error while running Athena-meta:

============================== check_reads ============================== --> 0 chunks need to be run. Skipping...

============================== subassemble_reads ============================== 4157 chunks to run. Starting... 2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.0 -- 2019-02-20 20:03:12 - performing local assembly for 17 seeds 2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage 2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_70878_length_2122_cov_16.687470 2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.260 -- 2019-02-20 20:03:12 - performing local assembly for 17 seeds 2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.520 -- 2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage 2019-02-20 20:03:12 - performing local assembly for 17 seeds 2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage 2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-02-20 20:03:12 - --starting logging SubassembleReadsStep.bin.780 -- 2019-02-20 20:03:12 - performing local assembly for 17 seeds 2019-02-20 20:03:12 - targeting 100x short-read subassembly coverage 2019-02-20 20:03:12 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_9167_length_13103_cov_180.557863 2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_97383_length_1373_cov_22.972686 2019-02-20 20:03:12 - assembling barcoded reads for seed NODE_68287_length_2218_cov_13.134535 2019-02-20 20:03:31 - seed NODE_68287_length_2218_cov_13.134535 contig does not have high enough coverage 2019-02-20 20:03:31 - - 64 bcodes, 6.46753832281x 2019-02-20 20:03:32 - assembling barcoded reads for seed NODE_76498_length_1932_cov_162.898775 2019-02-20 20:03:34 - seed NODE_70878_length_2122_cov_16.687470 contig does not have high enough coverage 2019-02-20 20:03:34 - - 95 bcodes, 9.93873704053x 2019-02-20 20:03:34 - assembling barcoded reads for seed NODE_108331_length_1156_cov_16.250681 2019-02-20 20:03:41 - determing local assemblies 2019-02-20 20:03:42 - determing local assemblies 2019-02-20 20:03:45 - 2 initial link candidates to check 2019-02-20 20:03:45 - 4 initial link candidates to check 2019-02-20 20:03:45 - - 0 pass reciprocal filtering 2019-02-20 20:03:46 - - 2 pass reciprocal filtering 2019-02-20 20:03:47 - root-ctg:NODE_97383_length_1373_cov_22.972686;numreads:184;checks:4;trunc-checks:False;asms:0;trunc-asms:False 2019-02-20 20:03:47 - - found 1 candidates 2019-02-20 20:03:48 - performing local assemblies 2019-02-20 20:03:48 - assembling with neighbor None 2019-02-20 20:03:48 - - 85 orig barcodes 2019-02-20 20:03:48 - - 85 downsampled barcodes 2019-02-20 20:03:48 - - 33.6372906045x estimated local coverage 2019-02-20 20:03:48 - - 2 min_support required 2019-02-20 20:03:48 - root-ctg:NODE_9167_length_13103_cov_180.557863;numreads:3241;checks:2;trunc-checks:False;asms:2;trunc-asms:False 2019-02-20 20:03:48 - - found 3 candidates 2019-02-20 20:03:49 - performing local assemblies 2019-02-20 20:03:49 - assembling with neighbor NODE_32750_length_4694_cov_218.604656 2019-02-20 20:03:49 - - 339 orig barcodes 2019-02-20 20:03:49 - - 339 downsampled barcodes 2019-02-20 20:03:49 - - 42.4174783734x estimated local coverage 2019-02-20 20:03:49 - - 2 min_support required 2019-02-20 20:04:02 - determing local assemblies 2019-02-20 20:04:06 - 3 initial link candidates to check 2019-02-20 20:04:06 - - 3 pass reciprocal filtering 2019-02-20 20:04:08 - root-ctg:NODE_76498_length_1932_cov_162.898775;numreads:1633;checks:3;trunc-checks:False;asms:3;trunc-asms:False 2019-02-20 20:04:08 - - found 4 candidates 2019-02-20 20:04:10 - performing local assemblies 2019-02-20 20:04:10 - assembling with neighbor NODE_36659_length_4228_cov_133.458184 2019-02-20 20:04:10 - - 229 orig barcodes 2019-02-20 20:04:10 - - 229 downsampled barcodes 2019-02-20 20:04:10 - - 51.5380911436x estimated local coverage 2019-02-20 20:04:10 - - 2 min_support required 2019-02-20 20:04:10 - determing local assemblies 2019-02-20 20:04:14 - 3 initial link candidates to check 2019-02-20 20:04:15 - - 0 pass reciprocal filtering 2019-02-20 20:04:16 - root-ctg:NODE_108331_length_1156_cov_16.250681;numreads:117;checks:3;trunc-checks:False;asms:0;trunc-asms:False 2019-02-20 20:04:16 - - found 1 candidates 2019-02-20 20:04:18 - performing local assemblies 2019-02-20 20:04:18 - assembling with neighbor None 2019-02-20 20:04:18 - - 62 orig barcodes 2019-02-20 20:04:18 - - 62 downsampled barcodes 2019-02-20 20:04:18 - - 25.4039792388x estimated local coverage 2019-02-20 20:04:18 - - 2 min_support required number of threads 2 2019-02-20 20:04:19 - assembly failed to produce contig.fa 2019-02-20 20:04:19 - ========== Exception ========== 2019-02-20 20:04:19 - Traceback (most recent call last): 2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/pipeline.py", line 50, in _run_chunk 2019-02-20 20:04:19 - chunk.run() 2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run 2019-02-20 20:04:19 - self.do_local_assembly(ctg, asmdir) 2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 145, in do_local_assembly 2019-02-20 20:04:19 - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs) 2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble 2019-02-20 20:04:19 - contig_path = self._do_idba_assembly(local_asm) 2019-02-20 20:04:19 - File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 173, in _do_idba_assembly 2019-02-20 20:04:19 - raise Exception() 2019-02-20 20:04:19 - Exception 2019-02-20 20:04:19 - 2019-02-20 20:04:19 - 2019-02-20 20:04:19 - --starting logging SubassembleReadsStep.bin.1040 -- 2019-02-20 20:04:19 - performing local assembly for 17 seeds 2019-02-20 20:04:19 - targeting 100x short-read subassembly coverage 2019-02-20 20:04:19 - using barcodes mapped within 10000bp from seed end-points for seed subassembly Traceback (most recent call last): File "/gpfs0/home/apps/athena/1.1/venv/bin/athena-meta", line 11, in load_entry_point('athena==1.2', 'console_scripts', 'athena-meta')() File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/main.py", line 203, in main 2019-02-20 20:04:19 - assembling barcoded reads for seed NODE_60560_length_2545_cov_171.220482 run(options) File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/main.py", line 42, in run runner.run_stage(stage, stage_name) File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage cluster.map(_run_chunk, to_run) File "/gpfs0/home/apps/athena/1.1/venv/lib/python2.7/site-packages/athena/cluster.py", line 43, in map return pool.map_async(fn, args).get(9999999) File "/apps/miniconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value Exception

abishara commented 5 years ago

Hi,

I think this has to do with the idba_subasm prereq not being set up correctly. Can you try running

athena-meta --check_prereqs

to see if this is indeed the issue? If so, can you install this version of idba:

https://github.com/abishara/idba/releases/tag/1.1.3a1

and make sure the binaries, including idba_subasm, are in your path? I think checking for prereqs should be a required stage in itself and I will make an update to do this so that a more informative error message is always printed.

Thanks!

nkalsi22 commented 5 years ago

Hi,

I ran athena-meta --check_prereqs, and the idba_subasm prereq seems to be set up correctly.

$ athena-meta --check_prereqs

flye [ok]

bwa [ok]

samtools [ok]

idba_subasm [ok]

(venv)

I also ran :

$ time athena-meta --test And it ran successfully. I have attached the log for reference.

Best, Namrata

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Friday, 1 March 2019 at 2:03 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi,

I think this has to do with the idba_subasm prereq not being set up correctly. Can you try running

athena-meta --check_prereqs

to see if this is indeed the issue? If so, can you install this version of idba:

https://github.com/abishara/idba/releases/tag/1.1.3a1

and make sure the binaries, including idba_subasm, are in your path? I think checking for prereqs should be a required stage in itself and I will make an update to do this so that a more informative error message is always printed.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-468553766, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85C1YFEIAOypjlJJClKgzQbjdL7F0ks5vSMK7gaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

$ time athena-meta --test
running tiny test assembly [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.01 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.01 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.13-r1126 [main] CMD: bwa index /tmp/athena-testD1bMKx/seeds.fa [main] Real time: 0.229 sec; CPU: 0.026 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 7128 sequences (838821 bp)... [M::process] 0 single-end sequences; 7128 paired-end sequences [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (19, 1326, 12, 13) [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (135, 240, 342) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 756) [M::mem_pestat] mean and std.dev: (242.79, 119.85) [M::mem_pestat] low and high boundaries for proper pairs: (1, 963) [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (337, 397, 478) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (55, 760) [M::mem_pestat] mean and std.dev: (410.19, 118.16) [M::mem_pestat] low and high boundaries for proper pairs: (1, 901) [M::mem_pestat] analyzing insert size distribution for orientation RF... [M::mem_pestat] (25, 50, 75) percentile: (26, 106, 153) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 407) [M::mem_pestat] mean and std.dev: (78.20, 50.00) [M::mem_pestat] low and high boundaries for proper pairs: (1, 534) [M::mem_pestat] analyzing insert size distribution for orientation RR... [M::mem_pestat] (25, 50, 75) percentile: (132, 220, 311) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 669) [M::mem_pestat] mean and std.dev: (235.46, 124.90) [M::mem_pestat] low and high boundaries for proper pairs: (1, 848) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_process_seqs] Processed 7128 reads in 0.493 CPU sec, 0.498 real sec [main] Version: 0.7.13-r1126 [main] CMD: bwa mem -p -C /tmp/athena-testD1bMKx/seeds.fa /tmp/athena-testD1bMKx/reads.fq [main] Real time: 0.559 sec; CPU: 0.536 sec ============================== check_reads ============================== 1 chunks to run. Starting... 2019-03-04 04:10:52 - --starting logging CheckReadsStep_tmp -- 2019-03-04 04:10:52 - index fastq /tmp/athena-testD1bMKx/reads.fq 2019-03-04 04:10:52 - fqinfo$7128,568,568 writing index for fqs

============================== subassemble_reads ============================== 3 chunks to run. Starting... 2019-03-04 04:10:53 - --starting logging SubassembleReadsStep.bin.0 -- 2019-03-04 04:10:53 - performing local assembly for 1 seeds 2019-03-04 04:10:53 - targeting 100x short-read subassembly coverage 2019-03-04 04:10:53 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-03-04 04:10:53 - assembling barcoded reads for seed NODE_41_length_6882_cov_15.3474 2019-03-04 04:10:53 - determing local assemblies 2019-03-04 04:10:53 - 1 initial link candidates to check 2019-03-04 04:10:53 - - 1 pass reciprocal filtering 2019-03-04 04:10:53 - root-ctg:NODE_41_length_6882_cov_15.3474;numreads:1646;checks:1;trunc-checks:False;asms:1;trunc-asms:False 2019-03-04 04:10:53 - - found 2 candidates 2019-03-04 04:10:53 - performing local assemblies 2019-03-04 04:10:53 - - skipping filtered contig NODE_43_length_4973_cov_15.8953 2019-03-04 04:10:53 - assembling with neighbor None 2019-03-04 04:10:53 - - 397 orig barcodes 2019-03-04 04:10:53 - - 397 downsampled barcodes 2019-03-04 04:10:53 - - 31.8102295844x estimated local coverage 2019-03-04 04:10:53 - - 2 min_support required number of threads 2 reads 5642 long reads 10 seed contigs 1 extra reads 0 read_length 112 kmer 20 kmers 65646 65610 merge bubble 14 contigs: 133 n50: 1497 max: 13754 mean: 439 total length: 58429 n80: 397 aligned 4355 reads confirmed bases: 37759 correct reads: 3118 bases: 468 distance mean 415.981 sd 114.566 seed contigs 54 local contigs 266 kmer 40 kmers 58244 58215 merge bubble 0 contigs: 64 n50: 4681 max: 16024 mean: 907 total length: 58107 n80: 647 aligned 4546 reads confirmed bases: 39789 correct reads: 3257 bases: 34 distance mean 417.782 sd 113.718 seed contigs 41 local contigs 128 kmer 60 kmers 55958 55924 merge bubble 0 contigs: 39 n50: 10559 max: 18060 mean: 1467 total length: 57221 n80: 872 aligned 4627 reads confirmed bases: 40723 correct reads: 3317 bases: 6 distance mean 419.264 sd 114.212 seed contigs 36 local contigs 78 kmer 80 kmers 54709 54682 merge bubble 0 contigs: 35 n50: 10559 max: 23657 mean: 1625 total length: 56876 n80: 896 aligned 4625 reads confirmed bases: 40645 correct reads: 3326 bases: 0 distance mean 420.463 sd 114.673 seed contigs 33 local contigs 70 kmer 100 kmers 53733 53704 merge bubble 0 contigs: 33 n50: 10559 max: 23677 mean: 1719 total length: 56736 n80: 1026 aligning seed contigs

============================== assemble_olc ============================== 1 chunks to run. Starting... 2019-03-04 04:11:19 - --starting logging AssembleOLCStep -- 2019-03-04 04:11:19 - merge input contigs cmd bwa mem -t 1 /tmp/athena-testD1bMKx/seeds.fa /tmp/athena-testD1bMKx/results/olc/pre-flye-input-contigs.fa | samtools view -bS - | samtools sort -o /tmp/athena-testD1bMKx/results/olc/align-inputs.bam - [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 3 sequences (50640 bp)... [M::mem_process_seqs] Processed 3 reads in 0.214 CPU sec, 0.213 real sec [main] Version: 0.7.13-r1126 [main] CMD: bwa mem -t 1 /tmp/athena-testD1bMKx/seeds.fa /tmp/athena-testD1bMKx/results/olc/pre-flye-input-contigs.fa [main] Real time: 0.215 sec; CPU: 0.217 sec cmd samtools index /tmp/athena-testD1bMKx/results/olc/align-inputs.bam 2019-03-04 04:11:19 - filter short subassembled contigs and merge with seeds [fai_load] build FASTA index. orig ctgs 3 filtered ctgs 3 launching Flye OLC assembly cmd flye --subassemblies /tmp/athena-testD1bMKx/results/olc/flye-input-contigs.fa --out-dir /tmp/athena-testD1bMKx/results/olc/flye-asm-1 --genome-size 11855 --threads 1 --min-overlap 1000 [2019-03-04 04:11:20] INFO: Running Flye 2.3.7-release [2019-03-04 04:11:20] INFO: Configuring run [2019-03-04 04:11:20] INFO: Input genome size: 11855 [2019-03-04 04:11:20] INFO: Estimated coverage: 9 [2019-03-04 04:11:20] INFO: Reads N50/N90: 6882 / 4973 [2019-03-04 04:11:20] INFO: Selected k-mer size: 31 [2019-03-04 04:11:20] INFO: Assembling reads [2019-03-04 04:11:20] INFO: Reading sequences [2019-03-04 04:11:20] INFO: Generating solid k-mer index [2019-03-04 04:12:28] INFO: Counting kmers (1/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:12:28] INFO: Counting kmers (2/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:12:28] INFO: Filling index table 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:12:28] INFO: Extending reads [2019-03-04 04:12:29] INFO: Overlap-based coverage: 6 [2019-03-04 04:12:29] INFO: Median overlap divergence: 0.000486797 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:12:29] INFO: Added 3 singleton reads [2019-03-04 04:12:29] INFO: Assembled 3 draft contigs [2019-03-04 04:12:29] INFO: Generating contig sequences [2019-03-04 04:12:29] INFO: Performing repeat analysis [2019-03-04 04:12:29] INFO: Reading sequences [2019-03-04 04:12:29] INFO: Building repeat graph 10% 30% 50% 60% 80% 100% [2019-03-04 04:13:17] INFO: Median overlap divergence: 0.00683326 [2019-03-04 04:13:17] INFO: Aligning reads to the graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:14:04] INFO: Aligned read sequence: 109231 / 109915 (0.993777) [2019-03-04 04:14:04] INFO: Median overlap divergence: 0.00029688 [2019-03-04 04:14:04] INFO: Mean edge coverage: 1 [2019-03-04 04:14:04] INFO: Resolving repeats [2019-03-04 04:14:04] INFO: Generating contigs [2019-03-04 04:14:04] INFO: Generated 1 contigs [2019-03-04 04:14:04] INFO: Polishing genome (1/1) [2019-03-04 04:14:04] INFO: Running minimap2 [2019-03-04 04:14:04] INFO: Separating alignment into bubbles [2019-03-04 04:14:05] INFO: Alignment error rate: 8.60135249046e-05 [2019-03-04 04:14:05] INFO: Correcting bubbles 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-04 04:14:07] INFO: Assembly statistics:

Total length:   34378
Contigs:    1
Scaffolds:  1
Scaffolds N50:  34378
Largest scf:    34378
Mean coverage:  3

[2019-03-04 04:14:07] INFO: Final assembly: /tmp/athena-testD1bMKx/results/olc/flye-asm-1/scaffolds.fasta 2019-03-04 04:14:07 - done 2019-03-04 04:14:07 - -> finished running step; time elapsed: 0:02:47.971463 2019-03-04 04:14:07 - --stopping logging-- --> assemble_olc completed.

Athena contigs: /tmp/athena-testD1bMKx/results/olc/athena.asm.fa [fai_load] build FASTA index. --> test completed successfully.

real 3m21.189s user 3m35.916s sys 0m7.079s

abishara commented 5 years ago

Hi Namrata,

Hmm, that's interesting. With the same installation, do you mind rerunning the original erring command with --threads 1 (single-threaded) from a new build directory (remove logs/ results/ working/) and then copying the full output trace? Can you also copy the output of which idba_subasm? This will help me better understand the issue.

Thanks! alex

nkalsi22 commented 5 years ago

Hi, Here’s the zip file for the logs/, results/, and working/ directory. I have also attached the script and the config.json file.

$ which idba_subasm /apps/idba_ud/athena/bin/idba_subasm

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 4 March 2019 at 4:36 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Hmm, that's interesting. With the same installation, do you mind rerunning the original erring command with --threads 1 (single-threaded) from a new build directory (remove logs/ results/ working/) and then copying the full output trace? Can you also copy the output of which idba_subasm? This will help me better understand the issue.

Thanks! alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-469061518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85ERG3hImaN8wCbQasmFUT2jN0ZLvks5vTDJDgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

nkalsi22 commented 5 years ago

Hi,

I apologise I think the attachment didn’t go through. You can find the files at : https://www.dropbox.com/sh/ipm0txb1nfgfjh7/AADFCUjEfIaBh3U8pPv7iBPaa?dl=0

Namrata From: Namrata Kalsi nkalsi@ntu.edu.sg Date: Thursday, 7 March 2019 at 2:44 PM To: abishara/athena_meta reply@reply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi, Here’s the zip file for the logs/, results/, and working/ directory. I have also attached the script and the config.json file.

$ which idba_subasm /apps/idba_ud/athena/bin/idba_subasm

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 4 March 2019 at 4:36 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Hmm, that's interesting. With the same installation, do you mind rerunning the original erring command with --threads 1 (single-threaded) from a new build directory (remove logs/ results/ working/) and then copying the full output trace? Can you also copy the output of which idba_subasm? This will help me better understand the issue.

Thanks! alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-469061518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85ERG3hImaN8wCbQasmFUT2jN0ZLvks5vTDJDgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

nkalsi22 commented 5 years ago

Hi, Any update regarding the error?

Namrata

Get Outlook for Androidhttps://aka.ms/ghei36

On Wed, Mar 13, 2019 at 11:30 AM +0800, "Namrata Kalsi" nkalsi@ntu.edu.sg<mailto:nkalsi@ntu.edu.sg> wrote:

Hi,

I apologise I think the attachment didn’t go through. You can find the files at : https://www.dropbox.com/sh/ipm0txb1nfgfjh7/AADFCUjEfIaBh3U8pPv7iBPaa?dl=0

Namrata From: Namrata Kalsi nkalsi@ntu.edu.sg Date: Thursday, 7 March 2019 at 2:44 PM To: abishara/athena_meta reply@reply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi, Here’s the zip file for the logs/, results/, and working/ directory. I have also attached the script and the config.json file.

$ which idba_subasm /apps/idba_ud/athena/bin/idba_subasm

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 4 March 2019 at 4:36 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Hmm, that's interesting. With the same installation, do you mind rerunning the original erring command with --threads 1 (single-threaded) from a new build directory (remove logs/ results/ working/) and then copying the full output trace? Can you also copy the output of which idba_subasm? This will help me better understand the issue.

Thanks! alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-469061518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85ERG3hImaN8wCbQasmFUT2jN0ZLvks5vTDJDgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Hi Namrata,

This is a bit difficult for me to debug, since it still appears to be something wrong with the environment setup from the logs you've shared.

Using the same submission script run_athena.sh.txt that you shared with me, can you modify line 17 to run both:

athena-meta --test

and

athena-meta --check_prereq

and share with me the stdout of your queue submission system (logs/athena.oXXXX).

Also, just to make sure it isn't your set of reads (I don't feel this is the case) can you go ahead and try the example dataset in the README of Athena:

https://storage.googleapis.com/gbsc-gcp-lab-bhatt-public/readclouds-l-gasseri-example.tar.gz

with the same environment set up and make sure this can go thru?

Thanks, alex

nkalsi22 commented 5 years ago

Hi,

Sorry for the delay caused.

I have attached two log files.

athena.o450850: this runs athena-meta --test and then athena-meta --check_prereq. athena_example.o450851: This runs Athena on the example dataset that you specified (https://storage.googleapis.com/gbsc-gcp-lab-bhatt-public/readclouds-l-gasseri-example.tar.gz) and also athena-meta --check_prereq.

Best, Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Tuesday, 19 March 2019 at 10:02 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

This is a bit difficult for me to debug, since it still appears to be something wrong with the environment setup from the logs you've shared.

Using the same submission script run_athena.sh.txt that you shared with me, can you modify line 17 to run both:

athena-meta --test

and

athena-meta --check_prereq

and share with me the stdout of your queue submission system (logs/athena.oXXXX).

Also, just to make sure it isn't your set of reads (I don't feel this is the case) can you go ahead and try the example dataset in the README of Athena:

https://storage.googleapis.com/gbsc-gcp-lab-bhatt-public/readclouds-l-gasseri-example.tar.gz

with the same environment set up and make sure this can go thru?

Thanks, alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-474168606, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85GlHo7V0lnXJLeYRujNV8X9l7H8iks5vYEUMgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

Cheching Prerequisites... flye [ok] bwa [ok] samtools [ok] idba_subasm [ok] Running Athena on example data ============================== check_reads ============================== 1 chunks to run. Starting... 2019-03-27 10:19:22 - --starting logging CheckReadsStep_chromium -- 2019-03-27 10:19:22 - index fastq /scratch/namrata/chromium/readclouds-l-gasseri-example/reads.fq 2019-03-27 10:19:22 - get seed contigs from input assembly 2019-03-27 10:19:22 - 239 total inputs seeds covering 1857551 bases 2019-03-27 10:19:22 - 56 input seed contigs >= 400bp and >= 10.0x coverage covering 1834024 bases 2019-03-27 10:19:22 - created 57 bins from seeds 2019-03-27 10:19:23 - done 2019-03-27 10:19:23 - -> finished running step; time elapsed: 0:00:00.235299 2019-03-27 10:19:23 - --stopping logging-- --> check_reads completed.

============================== subassemble_reads ============================== --> 0 chunks need to be run. Skipping...

============================== assemble_olc ============================== --> 0 chunks need to be run. Skipping...

Athena contigs: /scratch/namrata/chromium/readclouds-l-gasseri-example/results/olc/athena.asm.fa

running tiny test assembly [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.13-r1126 [main] CMD: bwa index /tmp/450850.1.smurf.q/athena-testT4PL37/seeds.fa [main] Real time: 0.240 sec; CPU: 0.027 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 7128 sequences (838821 bp)... [M::process] 0 single-end sequences; 7128 paired-end sequences [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (19, 1326, 12, 13) [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (135, 240, 342) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 756) [M::mem_pestat] mean and std.dev: (242.79, 119.85) [M::mem_pestat] low and high boundaries for proper pairs: (1, 963) [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (337, 397, 478) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (55, 760) [M::mem_pestat] mean and std.dev: (410.19, 118.16) [M::mem_pestat] low and high boundaries for proper pairs: (1, 901) [M::mem_pestat] analyzing insert size distribution for orientation RF... [M::mem_pestat] (25, 50, 75) percentile: (26, 106, 153) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 407) [M::mem_pestat] mean and std.dev: (78.20, 50.00) [M::mem_pestat] low and high boundaries for proper pairs: (1, 534) [M::mem_pestat] analyzing insert size distribution for orientation RR... [M::mem_pestat] (25, 50, 75) percentile: (132, 220, 311) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 669) [M::mem_pestat] mean and std.dev: (235.46, 124.90) [M::mem_pestat] low and high boundaries for proper pairs: (1, 848) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_process_seqs] Processed 7128 reads in 0.555 CPU sec, 0.553 real sec [main] Version: 0.7.13-r1126 [main] CMD: bwa mem -p -C /tmp/450850.1.smurf.q/athena-testT4PL37/seeds.fa /tmp/450850.1.smurf.q/athena-testT4PL37/reads.fq [main] Real time: 0.609 sec; CPU: 0.604 sec ============================== check_reads ============================== 1 chunks to run. Starting... 2019-03-25 14:46:37 - --starting logging CheckReadsStep_450850.1.smurf.q -- 2019-03-25 14:46:37 - index fastq /tmp/450850.1.smurf.q/athena-testT4PL37/reads.fq 2019-03-25 14:46:37 - fqinfo$7128,568,568 2019-03-25 14:46:37 - get seed contigs from input assembly 2019-03-25 14:46:37 - computing seed coverages (required pass thru *bam) [fai_load] build FASTA index. 2019-03-25 14:46:37 - 2 total inputs seeds covering 11855 bases 2019-03-25 14:46:37 - 2 input seed contigs >= 400bp and >= 10.0x coverage covering 11855 bases 2019-03-25 14:46:37 - created 3 bins from seeds 2019-03-25 14:46:37 - done 2019-03-25 14:46:37 - -> finished running step; time elapsed: 0:00:00.548925 2019-03-25 14:46:37 - --stopping logging-- --> check_reads completed.

============================== subassemble_reads ============================== 3 chunks to run. Starting... 2019-03-25 14:46:37 - --starting logging SubassembleReadsStep.bin.0 -- 2019-03-25 14:46:37 - performing local assembly for 1 seeds 2019-03-25 14:46:37 - targeting 100x short-read subassembly coverage 2019-03-25 14:46:37 - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-03-25 14:46:37 - assembling barcoded reads for seed NODE_41_length_6882_cov_15.3474 2019-03-25 14:46:37 - determing local assemblies 2019-03-25 14:46:37 - 1 initial link candidates to check 2019-03-25 14:46:37 - - 1 pass reciprocal filtering 2019-03-25 14:46:38 - root-ctg:NODE_41_length_6882_cov_15.3474;numreads:1646;checks:1;trunc-checks:False;asms:1;trunc-asms:False 2019-03-25 14:46:38 - - found 2 candidates 2019-03-25 14:46:38 - performing local assemblies 2019-03-25 14:46:38 - - skipping filtered contig NODE_43_length_4973_cov_15.8953 2019-03-25 14:46:38 - assembling with neighbor None 2019-03-25 14:46:38 - - 397 orig barcodes 2019-03-25 14:46:38 - - 397 downsampled barcodes 2019-03-25 14:46:38 - - 31.8102295844x estimated local coverage 2019-03-25 14:46:38 - - 2 min_support required number of threads 2 reads 5642 long reads 10 seed contigs 1 extra reads 0 read_length 112 kmer 20 kmers 65646 65610 merge bubble 14 contigs: 133 n50: 1497 max: 13754 mean: 439 total length: 58429 n80: 397 aligned 4355 reads confirmed bases: 37758 correct reads: 3118 bases: 468 distance mean 415.981 sd 114.566 seed contigs 54 local contigs 266 kmer 40 kmers 58244 58215 merge bubble 0 contigs: 64 n50: 4681 max: 16024 mean: 907 total length: 58107 n80: 647 aligned 4546 reads confirmed bases: 39789 correct reads: 3257 bases: 34 distance mean 417.782 sd 113.718 seed contigs 41 local contigs 128 kmer 60 kmers 55958 55924 merge bubble 0 contigs: 39 n50: 10559 max: 18060 mean: 1467 total length: 57221 n80: 872 aligned 4627 reads confirmed bases: 40723 correct reads: 3318 bases: 6 distance mean 419.264 sd 114.212 seed contigs 36 local contigs 78 kmer 80 kmers 54709 54682 merge bubble 0 contigs: 35 n50: 10559 max: 23657 mean: 1625 total length: 56876 n80: 896 aligned 4625 reads confirmed bases: 40645 correct reads: 3326 bases: 0 distance mean 420.463 sd 114.673 seed contigs 33 local contigs 70 kmer 100 kmers 53733 53704 merge bubble 0 contigs: 33 n50: 10559 max: 23677 mean: 1719 total length: 56736 n80: 1026 aligning seed contigs

============================== assemble_olc ============================== 1 chunks to run. Starting... 2019-03-25 14:46:58 - --starting logging AssembleOLCStep -- 2019-03-25 14:46:58 - merge input contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 3 sequences (50640 bp)... [M::mem_process_seqs] Processed 3 reads in 0.160 CPU sec, 0.159 real sec [main] Version: 0.7.13-r1126 [main] CMD: bwa mem -t 1 /tmp/450850.1.smurf.q/athena-testT4PL37/seeds.fa /tmp/450850.1.smurf.q/athena-testT4PL37/results/olc/pre-flye-input-contigs.fa [main] Real time: 0.161 sec; CPU: 0.162 sec 2019-03-25 14:46:58 - filter short subassembled contigs and merge with seeds [fai_load] build FASTA index. [2019-03-25 14:46:59] INFO: Running Flye 2.3.7-release [2019-03-25 14:46:59] INFO: Configuring run [2019-03-25 14:46:59] INFO: Input genome size: 11855 [2019-03-25 14:46:59] INFO: Estimated coverage: 9 [2019-03-25 14:46:59] INFO: Reads N50/N90: 6882 / 4973 [2019-03-25 14:46:59] INFO: Selected k-mer size: 31 [2019-03-25 14:46:59] INFO: Assembling reads [2019-03-25 14:46:59] INFO: Reading sequences [2019-03-25 14:46:59] INFO: Generating solid k-mer index [2019-03-25 14:47:44] INFO: Counting kmers (1/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:47:44] INFO: Counting kmers (2/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:47:44] INFO: Filling index table 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:47:44] INFO: Extending reads [2019-03-25 14:47:44] INFO: Overlap-based coverage: 6 [2019-03-25 14:47:44] INFO: Median overlap divergence: 0.000486797 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:47:45] INFO: Added 3 singleton reads [2019-03-25 14:47:45] INFO: Assembled 3 draft contigs [2019-03-25 14:47:45] INFO: Generating contig sequences [2019-03-25 14:47:45] INFO: Performing repeat analysis [2019-03-25 14:47:45] INFO: Reading sequences [2019-03-25 14:47:45] INFO: Building repeat graph 10% 30% 50% 60% 80% 100% [2019-03-25 14:48:40] INFO: Median overlap divergence: 0.00683326 [2019-03-25 14:48:40] INFO: Aligning reads to the graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:49:32] INFO: Aligned read sequence: 107911 / 109915 (0.981768) [2019-03-25 14:49:32] INFO: Median overlap divergence: 0.000222846 [2019-03-25 14:49:32] INFO: Mean edge coverage: 1 [2019-03-25 14:49:32] INFO: Resolving repeats [2019-03-25 14:49:32] INFO: Generating contigs [2019-03-25 14:49:32] INFO: Generated 1 contigs [2019-03-25 14:49:32] INFO: Polishing genome (1/1) [2019-03-25 14:49:32] INFO: Running minimap2 [2019-03-25 14:49:33] INFO: Separating alignment into bubbles [2019-03-25 14:49:34] INFO: Alignment error rate: 3.01691888109e-06 [2019-03-25 14:49:34] INFO: Correcting bubbles 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-03-25 14:49:37] INFO: Assembly statistics:

Total length:   34378
Contigs:    1
Scaffolds:  1
Scaffolds N50:  34378
Largest scf:    34378
Mean coverage:  3

[2019-03-25 14:49:37] INFO: Final assembly: /tmp/450850.1.smurf.q/athena-testT4PL37/results/olc/flye-asm-1/scaffolds.fasta 2019-03-25 14:49:37 - done 2019-03-25 14:49:37 - -> finished running step; time elapsed: 0:02:39.304353 2019-03-25 14:49:37 - --stopping logging-- --> assemble_olc completed.

Athena contigs: /tmp/450850.1.smurf.q/athena-testT4PL37/results/olc/athena.asm.fa [fai_load] build FASTA index. --> test completed successfully.

writing index for fqs

nkalsi22 commented 5 years ago

Hi, Any success with troubleshooting?

Best,

[cid:image001.jpg@01D4E881.AAD1AC00] Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

[cid:image003.jpg@01D4E881.AAD1AC00]https://www.facebook.com/NTUsg/

[cid:image004.jpg@01D4E881.AAD1AC00]https://twitter.com/NTUsg

[cid:image005.jpg@01D4E881.AAD1AC00]https://www.youtube.com/user/NTUsg

[cid:image006.jpg@01D4E881.AAD1AC00]https://www.instagram.com/ntu_sg/

[cid:image007.jpg@01D4E881.AAD1AC00]https://www.linkedin.com/edu/nanyang-technological-university-17113

[cid:image008.jpg@01D4E881.AAD1AC00]http://www.ntu.edu.sg/common/NewsRSSFeeds.aspx?Type=News&Category=News%20Releases&SiteID=2

From: Namrata Kalsi nkalsi@ntu.edu.sg Date: Wednesday, 27 March 2019 at 5:35 PM To: abishara/athena_meta reply@reply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi,

Sorry for the delay caused.

I have attached two log files.

athena.o450850: this runs athena-meta --test and then athena-meta --check_prereq. athena_example.o450851: This runs Athena on the example dataset that you specified (https://storage.googleapis.com/gbsc-gcp-lab-bhatt-public/readclouds-l-gasseri-example.tar.gz) and also athena-meta --check_prereq.

Best, Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Tuesday, 19 March 2019 at 10:02 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

This is a bit difficult for me to debug, since it still appears to be something wrong with the environment setup from the logs you've shared.

Using the same submission script run_athena.sh.txt that you shared with me, can you modify line 17 to run both:

athena-meta --test

and

athena-meta --check_prereq

and share with me the stdout of your queue submission system (logs/athena.oXXXX).

Also, just to make sure it isn't your set of reads (I don't feel this is the case) can you go ahead and try the example dataset in the README of Athena:

https://storage.googleapis.com/gbsc-gcp-lab-bhatt-public/readclouds-l-gasseri-example.tar.gz

with the same environment set up and make sure this can go thru?

Thanks, alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-474168606, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85GlHo7V0lnXJLeYRujNV8X9l7H8iks5vYEUMgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Hi Namrata,

The only conclusion I can draw from this is that there is something peculiar about your own set of reads that you are trying to apply it to. Feel free to share a small subset of the dataset that produces the error as well as the error message (you can email me directly), and I can have a look at what exactly is going on. I think without doing this, I won't be able to determine the issue.

Best, alex

abishara commented 5 years ago

I'm still happy to look at your input dataset to figure out what's going on.

From your latest reply it seems the environment is not the issue, but just in case it is, you can install Athena with conda now (see #13). This should mitigate difficulties in setup.

Thanks, alex

nkalsi22 commented 5 years ago

Hi Alex,

We are trying to figure out what is causing the error. You can also have a look at the file at: https://www.dropbox.com/s/rwf7qb7yvxz8bzy/test1.barcoded.fastq?dl=0

Namrata

Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

[cid:image002.jpg@01D4EBA4.CC425490]https://www.facebook.com/NTUsg/

[cid:image003.jpg@01D4EBA4.CC425490]https://twitter.com/NTUsg

[cid:image004.jpg@01D4EBA4.CC425490]https://www.youtube.com/user/NTUsg

[cid:image005.jpg@01D4EBA4.CC425490]https://www.instagram.com/ntu_sg/

[cid:image006.jpg@01D4EBA4.CC425490]https://www.linkedin.com/edu/nanyang-technological-university-17113

[cid:image007.jpg@01D4EBA4.CC425490]http://www.ntu.edu.sg/common/NewsRSSFeeds.aspx?Type=News&Category=News%20Releases&SiteID=2

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Thursday, 4 April 2019 at 5:10 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

I'm still happy to look at your input dataset to figure out what's going on.

From your latest reply it seems the environment is not the issue, but just in case it is, you can install Athena with conda now (see #13https://github.com/abishara/athena_meta/issues/13). This should mitigate difficulties in setup.

Thanks, alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-479660320, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85KbPNm1PjocuSiSWJw-61E7Z1Yjtks5vdRjVgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

nkalsi22 commented 5 years ago

Hi, If the dataset is the problem, how do we check for it?

Namrata

From: Namrata Kalsi nkalsi@ntu.edu.sg Date: Friday, 5 April 2019 at 11:43 AM To: abishara/athena_meta reply@reply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Alex,

We are trying to figure out what is causing the error. You can also have a look at the file at: https://www.dropbox.com/s/rwf7qb7yvxz8bzy/test1.barcoded.fastq?dl=0

Namrata

Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

[cid:image002.jpg@01D4EBB7.6928EFA0]https://www.facebook.com/NTUsg/

[cid:image003.jpg@01D4EBB7.6928EFA0]https://twitter.com/NTUsg

[cid:image004.jpg@01D4EBB7.6928EFA0]https://www.youtube.com/user/NTUsg

[cid:image005.jpg@01D4EBB7.6928EFA0]https://www.instagram.com/ntu_sg/

[cid:image006.jpg@01D4EBB7.6928EFA0]https://www.linkedin.com/edu/nanyang-technological-university-17113

[cid:image007.jpg@01D4EBB7.6928EFA0]http://www.ntu.edu.sg/common/NewsRSSFeeds.aspx?Type=News&Category=News%20Releases&SiteID=2

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Thursday, 4 April 2019 at 5:10 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

I'm still happy to look at your input dataset to figure out what's going on.

From your latest reply it seems the environment is not the issue, but just in case it is, you can install Athena with conda now (see #13https://github.com/abishara/athena_meta/issues/13). This should mitigate difficulties in setup.

Thanks, alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-479660320, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85KbPNm1PjocuSiSWJw-61E7Z1Yjtks5vdRjVgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================
1 chunks to run. Starting...                                              
2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq             
2019-04-08 15:36:43 - INFO - get seed contigs from input assembly
2019-04-08 15:36:43 - INFO -   11722 total inputs seeds covering 5521519 bases
2019-04-08 15:36:43 - INFO -   0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases
2019-04-08 15:36:43 - INFO - created 1 bins from seeds                                                                  
2019-04-08 15:36:43 - INFO - done            
--> check_reads completed. 
....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best, alex

nkalsi22 commented 5 years ago

Hi Alex,

I will get in touch with my head to see whether we can share the data. Meanwhile, do you have another test/dummy/example dataset or published dataset that we can try running Athena on? This would give us a better idea to understand the issue.

Namrata

[cid:image001.jpg@01D4EEF5.7CD418E0]Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

[cid:image003.jpg@01D4EEF5.7CD418E0]https://www.facebook.com/NTUsg/

[cid:image004.jpg@01D4EEF5.7CD418E0]https://twitter.com/NTUsg

[cid:image005.jpg@01D4EEF5.7CD418E0]https://www.youtube.com/user/NTUsg

[cid:image006.jpg@01D4EEF5.7CD418E0]https://www.instagram.com/ntu_sg/

[cid:image007.jpg@01D4EEF5.7CD418E0]https://www.linkedin.com/edu/nanyang-technological-university-17113

[cid:image008.jpg@01D4EEF5.7CD418E0]http://www.ntu.edu.sg/common/NewsRSSFeeds.aspx?Type=News&Category=News%20Releases&SiteID=2

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 8 April 2019 at 11:45 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================

1 chunks to run. Starting...

2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq

2019-04-08 15:36:43 - INFO - get seed contigs from input assembly

2019-04-08 15:36:43 - INFO - 11722 total inputs seeds covering 5521519 bases

2019-04-08 15:36:43 - INFO - 0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases

2019-04-08 15:36:43 - INFO - created 1 bins from seeds

2019-04-08 15:36:43 - INFO - done

--> check_reads completed.

....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best,

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-480887578, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85FPKr26FqoAayWbavZCYZ-wcooUCks5ve2QbgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Hi Namrata,

Sure, although please only share a smaller scale example that reproduces an error message that is unintuitive. Please remember to remove state rm -rf working/ logs/ results/if you modify the input dataset.

All the data from our publication should be accessible to you:

https://www.nature.com/articles/nbt.4266 https://www.nature.com/articles/nbt.4266

though I feel the dataset posted on the github page should be sufficient to work out any issues.

Best, alex

On Apr 9, 2019, at 1:59 AM, nkalsi22 notifications@github.com wrote:

Hi Alex,

I will get in touch with my head to see whether we can share the data. Meanwhile, do you have another test/dummy/example dataset or published dataset that we can try running Athena on? This would give us a better idea to understand the issue.

Namrata

[cid:image001.jpg@01D4EEF5.7CD418E0]Ms Namrata KALSI Research Associate, Singapore Centre for Environmental Life Sciences Engineering

50 Nanyang Avenue, SBS-B3n-27, Singapore 639798 T 65- F 65-6XXX-XXXX nkalsi@ntu.edu.sgmailto:nkalsi@ntu.edu.sg www.ntu.edu.sghttp://www.ntu.edu.sg/

[cid:image003.jpg@01D4EEF5.7CD418E0]https://www.facebook.com/NTUsg/

[cid:image004.jpg@01D4EEF5.7CD418E0]https://twitter.com/NTUsg

[cid:image005.jpg@01D4EEF5.7CD418E0]https://www.youtube.com/user/NTUsg

[cid:image006.jpg@01D4EEF5.7CD418E0]https://www.instagram.com/ntu_sg/

[cid:image007.jpg@01D4EEF5.7CD418E0]https://www.linkedin.com/edu/nanyang-technological-university-17113

[cid:image008.jpg@01D4EEF5.7CD418E0]http://www.ntu.edu.sg/common/NewsRSSFeeds.aspx?Type=News&Category=News%20Releases&SiteID=2

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 8 April 2019 at 11:45 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================

1 chunks to run. Starting...

2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq

2019-04-08 15:36:43 - INFO - get seed contigs from input assembly

2019-04-08 15:36:43 - INFO - 11722 total inputs seeds covering 5521519 bases

2019-04-08 15:36:43 - INFO - 0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases

2019-04-08 15:36:43 - INFO - created 1 bins from seeds

2019-04-08 15:36:43 - INFO - done

--> check_reads completed.

....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best,

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-480887578, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85FPKr26FqoAayWbavZCYZ-wcooUCks5ve2QbgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/abishara/athena_meta/issues/22#issuecomment-481165493, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzgIY0NIYrsPYSKWH8FVMWUExz_waY1ks5vfFZVgaJpZM4bJGVJ.

nkalsi22 commented 5 years ago

Hi Alex,

I ran Athena with the same dataset that I sent you. This dataset is just the first 10,000 or so reads from our original dataset that is causing the issue. I ran Athena on this dataset with the bam and fasta files created with the original file (all reads). Although eventually it gave me the same error (“assembly failed to produce contig.fa”), this time it ran for much longer and there were a few error messages from ibda_ud.

https://www.dropbox.com/sh/vcqztks8aqtv03e/AAA12kCb0m_d74c1zmwDq8E0a?dl=0

You can find the output here. I deleted the results, logs, and working folder before running this.

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 8 April 2019 at 11:45 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================

1 chunks to run. Starting...

2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq

2019-04-08 15:36:43 - INFO - get seed contigs from input assembly

2019-04-08 15:36:43 - INFO - 11722 total inputs seeds covering 5521519 bases

2019-04-08 15:36:43 - INFO - 0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases

2019-04-08 15:36:43 - INFO - created 1 bins from seeds

2019-04-08 15:36:43 - INFO - done

--> check_reads completed.

....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best,

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-480887578, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85FPKr26FqoAayWbavZCYZ-wcooUCks5ve2QbgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Hi Namrata,

Thanks for this. It looks like there really is some interesting case your input reads are hitting that is making the subassembly step break.

Can you share with me the short input fastq, bam, and seed *fasta that you used to produce this error? If you share these with me, then I can reproduce the error on my side and figure out what is going on so that I can fix it.

Thanks for your patience.

alex

On Apr 10, 2019, at 9:01 PM, nkalsi22 notifications@github.com wrote:

Hi Alex,

I ran Athena with the same dataset that I sent you. This dataset is just the first 10,000 or so reads from our original dataset that is causing the issue. I ran Athena on this dataset with the bam and fasta files created with the original file (all reads). Although eventually it gave me the same error (“assembly failed to produce contig.fa”), this time it ran for much longer and there were a few error messages from ibda_ud.

https://www.dropbox.com/sh/vcqztks8aqtv03e/AAA12kCb0m_d74c1zmwDq8E0a?dl=0

You can find the output here. I deleted the results, logs, and working folder before running this.

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 8 April 2019 at 11:45 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================

1 chunks to run. Starting...

2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq

2019-04-08 15:36:43 - INFO - get seed contigs from input assembly

2019-04-08 15:36:43 - INFO - 11722 total inputs seeds covering 5521519 bases

2019-04-08 15:36:43 - INFO - 0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases

2019-04-08 15:36:43 - INFO - created 1 bins from seeds

2019-04-08 15:36:43 - INFO - done

--> check_reads completed.

....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best,

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-480887578, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85FPKr26FqoAayWbavZCYZ-wcooUCks5ve2QbgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/abishara/athena_meta/issues/22#issuecomment-481956490, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzgIYjx51YNo_91ijLxzjCjviYf_RjJks5vfrOygaJpZM4bJGVJ.

nkalsi22 commented 5 years ago

Hi,

You can find these files at: https://www.dropbox.com/sh/0ffuof79fzilqu7/AACqEW-bWccvaogFSeKII1mYa?dl=0

Namrata

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Friday, 12 April 2019 at 7:20 AM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Thanks for this. It looks like there really is some interesting case your input reads are hitting that is making the subassembly step break.

Can you share with me the short input fastq, bam, and seed *fasta that you used to produce this error? If you share these with me, then I can reproduce the error on my side and figure out what is going on so that I can fix it.

Thanks for your patience.

alex

On Apr 10, 2019, at 9:01 PM, nkalsi22 notifications@github.com wrote:

Hi Alex,

I ran Athena with the same dataset that I sent you. This dataset is just the first 10,000 or so reads from our original dataset that is causing the issue. I ran Athena on this dataset with the bam and fasta files created with the original file (all reads). Although eventually it gave me the same error (“assembly failed to produce contig.fa”), this time it ran for much longer and there were a few error messages from ibda_ud.

https://www.dropbox.com/sh/vcqztks8aqtv03e/AAA12kCb0m_d74c1zmwDq8E0a?dl=0

You can find the output here. I deleted the results, logs, and working folder before running this.

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Monday, 8 April 2019 at 11:45 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

I took a look at your sample dataset. I first assembled it using metaspades to obtain seeds, then aligned reads to these seeds using bwa mem, and then ran the latest athena within the docker container specified on the github page. The following is in the beginning of the log:

============================== check_reads ==============================

1 chunks to run. Starting...

2019-04-08 15:36:43 - INFO - index fastq test1.barcoded.fastq

2019-04-08 15:36:43 - INFO - get seed contigs from input assembly

2019-04-08 15:36:43 - INFO - 11722 total inputs seeds covering 5521519 bases

2019-04-08 15:36:43 - INFO - 0 input seed contigs >= 400bp and >= 10.0x coverage covering 0 bases

2019-04-08 15:36:43 - INFO - created 1 bins from seeds

2019-04-08 15:36:43 - INFO - done

--> check_reads completed.

....

"0 input seeds >= 400bp and..." indicates there is not enough depth of coverage in this dataset to further assemble anything using Athena. However, this is different than the error you first copied me.

Each time you modify the input dataset, you need to rm -rf results/ logs/ working/ otherwise there will be state leftover from the previous one. If you can pass me another sample dataset that produces an error message that is not clear (please also pass me the seeds fasta file from metaspades and the bam from bwa if it is larger), I can take look to see what the issue is.

Best,

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-480887578, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85FPKr26FqoAayWbavZCYZ-wcooUCks5ve2QbgaJpZM4bJGVJ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/abishara/athena_meta/issues/22#issuecomment-481956490, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzgIYjx51YNo_91ijLxzjCjviYf_RjJks5vfrOygaJpZM4bJGVJ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-482369942, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ao_85NNE8SzgOQHAVWSITPxX7ee2bzLLks5vf8NKgaJpZM4bJGVJ.

abishara commented 5 years ago

Hi Namrata,

Okay, I took a look at this and I figured out the issue with Athena. I'm sorry it took so long to get to the bottom of it.

It turns out you are the first person to put long ~250bp Miseq reads through it and I set the max read length for idba in the current release (must be specified at build time) to be <170bp. I have created a new separate release of idba_subasm 1.1.3a2. If you download this version (https://github.com/abishara/idba/archive/1.1.3a2.tar.gz) and install it so that the idba_subasm in your environment points to this release, it should work.

I tested this updated version on the small read subset you sent me and the subassembly step passed without error. It failed in overlap assembly, but only because the depth during subassembly was too low. If you use the full dataset (or a larger sample of it), I think it should just work.

Please let me know how it goes. If that takes care of it, then I will soon update the conda environment and docker.

Thanks! alex

P.S. can you reply to these issue messages using the github page directly? I think replying from your email client is unnecessarily duplicating our previous exchanges.

abishara commented 5 years ago

Hi Namrata,

Just checking if this fix was able to resolve your issues. Let me know if you are still facing problems.

Thanks, alex

nkalsi22 commented 5 years ago

Yes, it is currently running. It’s been running since the last 4 days. I’ll let you know if and when it completes successfully, or if it crashes.

Namrata

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Friday, 26 April 2019 at 11:06 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Just checking if this fix was able to resolve your issues. Let me know if you are still facing problems.

Thanks, alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-487090813, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKH7ZZGSTVMQUMXTELXIOATPSMK5VANCNFSM4GZEMVEQ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

abishara commented 5 years ago

Sounds great, thanks for letting me know. What's the total sequencing depth of the sample in Gbp? The only thing I can suggest to speed it up would be to scale up --threads to use the full resources available on the node you are using. Hope it works out.

Best, alex

abishara commented 5 years ago

Hi Namrata,

Did everything work out for your original dataset?

alex

nkalsi22 commented 5 years ago

Hi Alex,

Yes, for one of the samples, Athena ran without no issues. Currently we will be doing some more tests on the results to check the quality of the assembly.

Thank you for your help.

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Wednesday, 8 May 2019 at 10:40 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Did everything work out for your original dataset?

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-490513562, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKH7ZZDTCAHT2CIFI5RB5ITPULQ6BANCNFSM4GZEMVEQ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

nkalsi22 commented 5 years ago

Hi Alex,

I’m having issues with Athena again, and I was asked to check with you. It’s a memory problem. Earlier I have run Athena for a dataset with 16 CPU cores and 16 threads. Now I have a similar dataset when I specify 16 cores and threads (or even when I reduce it to 8), I get:

Mon Jul 15 12:43:52 +08 2019 /cm/shared/apps/bio/idba/idba-1.1.3a2/bin/idba_subasm ============================== check_reads ============================== 1 chunks to run. Starting... 2019-07-15 12:43:53 - INFO - index fastq /gpfs0/scratch/namrata/chromium/barcoded_fastqs_2/sludge/outs/sludge.barcoded.fastq 2019-07-15 12:43:56 - INFO - get seed contigs from input assembly 2019-07-15 12:43:56 - INFO - computing seed coverages (required pass thru *bam) 2019-07-15 13:06:48 - INFO - 7814572 total inputs seeds covering 4445168955 bases 2019-07-15 13:06:48 - INFO - 106982 input seed contigs >= 400bp and >= 10.0x coverage covering 489828514 bases 2019-07-15 13:06:49 - INFO - created 4115 bins from seeds 2019-07-15 13:06:49 - INFO - done --> check_reads completed.

============================== subassemble_reads ============================== 4115 chunks to run. Starting... 2019-07-15 13:26:58 - INFO - finished subassembly bin.774 2019-07-15 13:32:33 - INFO - finished subassembly bin.645 2019-07-15 13:35:09 - INFO - finished subassembly bin.516 2019-07-15 13:37:01 - INFO - finished subassembly bin.0 2019-07-15 13:37:32 - INFO - finished subassembly bin.258 2019-07-15 13:43:43 - INFO - finished subassembly bin.387 2019-07-15 13:47:56 - INFO - finished subassembly bin.129 2019-07-15 13:50:08 - INFO - finished subassembly bin.903 2019-07-15 14:05:29 - INFO - finished subassembly bin.517 2019-07-15 14:07:37 - INFO - finished subassembly bin.775 2019-07-15 14:15:28 - INFO - finished subassembly bin.1 2019-07-15 14:19:30 - INFO - finished subassembly bin.259 2019-07-15 14:20:19 - INFO - finished subassembly bin.646 2019-07-15 14:24:59 - INFO - finished subassembly bin.388 2019-07-15 14:25:17 - INFO - finished subassembly bin.904 2019-07-15 14:32:43 - INFO - finished subassembly bin.130 2019-07-15 14:42:24 - INFO - finished subassembly bin.518 2019-07-15 14:46:02 - INFO - finished subassembly bin.2 2019-07-15 14:46:58 - INFO - finished subassembly bin.776 2019-07-15 14:58:23 - INFO - finished subassembly bin.389 2019-07-15 15:00:25 - INFO - finished subassembly bin.260 2019-07-15 15:01:27 - INFO - finished subassembly bin.905 2019-07-15 15:03:03 - INFO - finished subassembly bin.131 2019-07-15 15:06:14 - ERROR - ========== Exception ========== 2019-07-15 15:06:14 - ERROR - Traceback (most recent call last): 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk 2019-07-15 15:06:14 - ERROR - chunk.run() 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run 2019-07-15 15:06:14 - ERROR - self.do_local_assembly(ctg, asmdir) 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 147, in do_local_assembly 2019-07-15 15:06:14 - ERROR - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs) 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble 2019-07-15 15:06:14 - ERROR - contig_path = self._do_idba_assembly(local_asm) 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 160, in _do_idba_assembly 2019-07-15 15:06:14 - ERROR - stderr=subprocess.PIPE, 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 394, in init 2019-07-15 15:06:14 - ERROR - errread, errwrite) 2019-07-15 15:06:14 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 938, in _execute_child 2019-07-15 15:06:14 - ERROR - self.pid = os.fork() 2019-07-15 15:06:14 - ERROR - OSError: [Errno 12] Cannot allocate memory 2019-07-15 15:06:14 - ERROR - 2019-07-15 15:06:14 - ERROR - [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/cm/shared/apps/bio/athena_meta/1.3/bin/athena-meta", line 10, in sys.exit(main()) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/main.py", line 211, in main run(options) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/main.py", line 42, in run runner.run_stage(stage, stage_name) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage cluster.map(_run_chunk, to_run) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/cluster.py", line 43, in map return pool.map_async(fn, args).get(9999999) File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value OSError: [Errno 12] Cannot allocate memory

real 142m28.610s user 975m48.326s sys 64m35.607s

 Resource Usage on 2019-07-15 15:06:21.305847:
 JobId: 4703.pbs01                                                                               Project: airbiome
 Submission Host: ln-0001.scelse.sg
 Exit Status: 1
 NCPUs Requested: 8                                                                           NCPUs Used: 8
 Memory Requested: None                                                                                       Memory Used: 228235744kb
 Vmem Used: 238764328kb
 CPU Time Used: 17:20:25
 Walltime requested: 120:00:00                                                     Walltime Used: 02:22:29
 Start Time: Mon Jul 15 12:43:52 2019
 End Time: Mon Jul 15 15:06:21 2019
 Execution Nodes Used: (ca-0056:ncpus=8)
 ======================================================================================

I specified 8 cores and 8 threads for this.

I actually want to use more cores and threads to run this dataset so it completes faster.

Using 64 cores and threads gave me a different memory error.

[namrata@ln-0001 logs]$ cat 4712.pbs01.OU Mon Jul 15 19:14:52 +08 2019 /cm/shared/apps/bio/idba/idba-1.1.3a2/bin/idba_subasm ============================== check_reads ============================== 1 chunks to run. Starting... 2019-07-15 19:14:53 - INFO - index fastq /gpfs0/scratch/namrata/chromium/barcoded_fastqs_2/sludge/outs/sludge.barcoded.fastq 2019-07-15 19:14:55 - INFO - get seed contigs from input assembly 2019-07-15 19:14:55 - INFO - computing seed coverages (required pass thru *bam) 2019-07-15 19:37:27 - INFO - 7814572 total inputs seeds covering 4445168955 bases 2019-07-15 19:37:27 - INFO - 106982 input seed contigs >= 400bp and >= 10.0x coverage covering 489828514 bases 2019-07-15 19:37:28 - INFO - created 4115 bins from seeds 2019-07-15 19:37:28 - INFO - done --> check_reads completed.

============================== subassemble_reads ============================== 4115 chunks to run. Starting... 2019-07-15 19:41:53 - ERROR - ========== Exception ========== 2019-07-15 19:41:53 - ERROR - Traceback (most recent call last): 2019-07-15 19:41:53 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk 2019-07-15 19:41:53 - ERROR - chunk.run() 2019-07-15 19:41:53 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run 2019-07-15 19:41:53 - ERROR - self.do_local_assembly(ctg, asmdir) 2019-07-15 19:41:53 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 99, in do_local_assembly 2019-07-15 19:41:53 - ERROR - ctg_size_map = util.get_fasta_sizes(self.options.ctgfasta_path) 2019-07-15 19:41:53 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/mlib/util.py", line 112, in get_fasta_sizes 2019-07-15 19:41:53 - ERROR - ctg_size_map[ctg] = size 2019-07-15 19:41:53 - ERROR - MemoryError 2019-07-15 19:41:53 - ERROR - 2019-07-15 19:41:53 - ERROR - 2019-07-15 19:41:54 - ERROR - ========== Exception ========== 2019-07-15 19:41:54 - ERROR - Traceback (most recent call last): 2019-07-15 19:41:54 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk 2019-07-15 19:41:54 - ERROR - chunk.run() 2019-07-15 19:41:54 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run 2019-07-15 19:41:54 - ERROR - self.do_local_assembly(ctg, asmdir) 2019-07-15 19:41:54 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 99, in do_local_assembly 2019-07-15 19:41:54 - ERROR - ctg_size_map = util.get_fasta_sizes(self.options.ctgfasta_path) 2019-07-15 19:41:54 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/mlib/util.py", line 112, in get_fasta_sizes 2019-07-15 19:41:54 - ERROR - ctg_size_map[ctg] = size 2019-07-15 19:41:54 - ERROR - MemoryError 2019-07-15 19:41:54 - ERROR - 2019-07-15 19:41:54 - ERROR - Traceback (most recent call last): File "/cm/shared/apps/bio/athena_meta/1.3/bin/athena-meta", line 10, in sys.exit(main()) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/main.py", line 211, in main run(options) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/main.py", line 42, in run runner.run_stage(stage, stage_name) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 33, in run_stage cluster.map(_run_chunk, to_run) File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/cluster.py", line 43, in map return pool.map_async(fn, args).get(9999999) File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value MemoryError

real 27m7.455s user 108m53.573s sys 47m16.896s

 Resource Usage on 2019-07-15 19:41:59.680556:
 JobId: 4712.pbs01                                                         Project: airbiome
 Submission Host: ln-0001.scelse.sg
 Exit Status: 1
 NCPUs Requested: 64                                                   NCPUs Used: 64
 Memory Requested: None                                                          Memory Used: 254370140kb
 Vmem Used: 443676688kb
 CPU Time Used: 02:36:11
 Walltime requested: 120:00:00                                                 Walltime Used: 00:27:08
 Start Time: Mon Jul 15 19:14:51 2019
 End Time: Mon Jul 15 19:41:59 2019
 Execution Nodes Used: (ca-0061:ncpus=64)
 ======================================================================================

Is there a reason that it keeps failing?

Namrata

From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Wednesday, 8 May 2019 at 10:40 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Did everything work out for your original dataset?

alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22#issuecomment-490513562, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKH7ZZDTCAHT2CIFI5RB5ITPULQ6BANCNFSM4GZEMVEQ.

abishara commented 5 years ago

Hi Namrata,

Can you please create a separate issue as this is very unlikely to be the same underlying cause as the first one?

Also, my initial suggestion is to grep the logs directory and find the step that failed, then we can determine the set of inputs you are trying it on and look at that in isolation to see what is going wrong. If you share the log file with the failing 'memory error', then I can explain how to locate the inputs. My hunch is that one of the subassembly steps is very big for some reason, but it's hard for me to assess without actually looking at it.

Thanks! alex

nkalsi22 commented 5 years ago

Hi Alex,

I create a separate issue for this 18 days ago.

On checking the logs directory, it's the "SubassembleReadsStep.bin.1" that's giving the memory error.

$ cat SubassembleReadsStep.bin.1 2019-07-18 13:20:54 - DEBUG - --starting logging SubassembleReadsStep.bin.1 -- 2019-07-18 13:20:54 - DEBUG - performing local assembly for 26 seeds 2019-07-18 13:20:54 - DEBUG - targeting 100x short-read subassembly coverage 2019-07-18 13:20:54 - DEBUG - using barcodes mapped within 10000bp from seed end-points for seed subassembly 2019-07-18 13:20:54 - DEBUG - assembling barcoded reads for seed NODE_3594604_length_400_cov_20.831884 2019-07-18 13:22:03 - DEBUG - determing local assemblies 2019-07-18 13:22:14 - DEBUG - 2 initial link candidates to check 2019-07-18 13:22:14 - DEBUG - - 2 pass reciprocal filtering 2019-07-18 13:22:19 - DEBUG - root-ctg:NODE_3594604_length_400_cov_20.831884;numreads:52;checks:2;trunc-checks:False;asms:2;trunc-asms:False 2019-07-18 13:22:19 - DEBUG - - found 3 candidates 2019-07-18 13:22:24 - DEBUG - performing local assemblies 2019-07-18 13:22:24 - DEBUG - assembling with neighbor NODE_266230_length_1759_cov_19.257629 2019-07-18 13:22:24 - DEBUG - - 19 orig barcodes 2019-07-18 13:22:24 - DEBUG - - 19 downsampled barcodes 2019-07-18 13:22:24 - DEBUG - - 8.18351187704x estimated local coverage 2019-07-18 13:22:24 - DEBUG - - 2 min_support required 2019-07-18 13:22:52 - ERROR - ========== Exception ========== 2019-07-18 13:22:52 - ERROR - Traceback (most recent call last): 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/pipeline.py", line 51, in _run_chunk 2019-07-18 13:22:52 - ERROR - chunk.run() 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 80, in run 2019-07-18 13:22:52 - ERROR - self.do_local_assembly(ctg, asmdir) 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/stages/subassemble_reads.py", line 147, in do_local_assembly 2019-07-18 13:22:52 - ERROR - local_asm_results = asm.assemble(local_asms, filt_ctgs=seed_ctgs) 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 83, in assemble 2019-07-18 13:22:52 - ERROR - contig_path = self._do_idba_assembly(local_asm) 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/bio/athena_meta/1.3/lib/python2.7/site-packages/athena/subassembly/barcode_assembler.py", line 160, in _do_idba_assembly 2019-07-18 13:22:52 - ERROR - stderr=subprocess.PIPE, 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 394, in init 2019-07-18 13:22:52 - ERROR - errread, errwrite) 2019-07-18 13:22:52 - ERROR - File "/cm/shared/apps/devel/python/Python-2.7.16/lib/python2.7/subprocess.py", line 938, in _execute_child 2019-07-18 13:22:52 - ERROR - self.pid = os.fork() 2019-07-18 13:22:52 - ERROR - OSError: [Errno 12] Cannot allocate memory 2019-07-18 13:22:52 - ERROR - 2019-07-18 13:22:52 - ERROR - [Errno 12] Cannot allocate memory

Namrata From: abishara notifications@github.com Reply-To: abishara/athena_meta reply@reply.github.com Date: Tuesday, 16 July 2019 at 9:47 PM To: abishara/athena_meta athena_meta@noreply.github.com Cc: Namrata Kalsi nkalsi@ntu.edu.sg, Author author@noreply.github.com Subject: Re: [abishara/athena_meta] Error: assembly failed to produce contig.fa (#22)

Hi Namrata,

Can you please create a separate issue as this is very unlikely to be the same underlying cause as the first one?

Also, my initial suggestion is to grep the logs directory and find the step that failed, then we can determine the set of inputs you are trying it on and look at that in isolation to see what is going wrong. If you share the log file with the failing 'memory error', then I can explain how to locate the inputs. My hunch is that one of the subassembly steps is very big for some reason, but it's hard for me to assess without actually looking at it.

Thanks! alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abishara/athena_meta/issues/22?email_source=notifications&email_token=AKH7ZZF3MOK4A3I4V6ZKXRLP7XGN3A5CNFSM4GZEMVE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2A4Z3I#issuecomment-511823085, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKH7ZZCXC6WGOXYADEMVA2TP7XGN3ANCNFSM4GZEMVEQ.

abishara commented 5 years ago

Sorry, just responded to it

alex