HingeAssembler / HINGE

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd
Other
64 stars 9 forks source link

Issue with the dev branch #88

Closed alimayy closed 7 years ago

alimayy commented 7 years ago

Hi guys,

When I build HINGE using the dev branch and use my hinge wrapper script I get 0 contigs from my validation Sequel data. When I use the master I get 1 contig, which is the expected genome.

The commands I use are as follows

fq2fa m54072_160926_234436.subreads_downsampled_90k.fastq.0.fastq m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta

hinge correct-head m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta fasta_map.txt

fasta2DB hinge_assembly m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta

DBsplit -x500 -s100 hinge_assembly

HPC.daligner -t5 -T32 hinge_assembly| csh -v

LAmerge hinge_assembly.las hinge_assembly*.las

DASqv -c100 hinge_assembly hinge_assembly.las

hinge filter --db hinge_assembly --las hinge_assembly.las -x hinge_assembly --config /hinge/utils/nominal.ini

hinge layout --db hinge_assembly --las hinge_assembly.las -x hinge_assembly --config /hinge/utils/nominal.ini -o hinge_assembly

hinge clip hinge_assembly.edges.hinges hinge_assembly.hinge.list hinge_assembly_run_id

hinge draft-path ./ hinge_assembly hinge_assemblyhinge_assembly_run_id.G2.graphml

hinge draft --db hinge_assembly --las hinge_assembly.las --prefix hinge_assembly --config /hinge/utils/nominal.ini --out hinge_assembly.draft

get_draft_path_norevcomp.py hinge_assembly.draft.fasta hinge_assembly.draft.norevcomp.fasta

hinge correct-head hinge_assembly.draft.norevcomp.fasta hinge_assembly.draft.norevcomp.pb.fasta draft_map.txt

fasta2DB draft hinge_assembly.draft.norevcomp.pb.fasta

HPC.daligner hinge_assembly draft | zsh -v

hinge consensus draft hinge_assembly draft.hinge_assembly.las hinge_assembly.consensus.fasta /hinge/utils/nominal.ini

I can't really pinpoint exactly where/why it's going wrong when I used the dev build, but by looking at the contents (file sizes) of the two output directories from dev and master, it looks like it's the stage where the graph is built:

The output folder contents of master -rw-r--r-- 1 amay users 271M Oct 31 15:11 m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta -rw-r--r-- 1 amay users 5.4M Oct 31 15:11 fasta_map.txt -rw-r--r-- 1 amay users 274M Oct 31 15:11 m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta -rw-r--r-- 1 amay users 242 Oct 31 15:11 smrt1_90k.db -rw-r--r-- 1 amay users 509M Oct 31 15:17 smrt1_90k.las -rw-r--r-- 1 amay users 0 Oct 31 15:17 smrt1_90k.homologous.txt -rw-r--r-- 1 amay users 0 Oct 31 15:17 smrt1_90k.filtered.fasta -rw-r--r-- 1 amay users 0 Oct 31 15:18 debug.txt -rw-r--r-- 1 amay users 584K Oct 31 15:18 smrt1_90k.repeat.txt -rw-r--r-- 1 amay users 584K Oct 31 15:18 smrt1_90k.hinges.txt -rw-r--r-- 1 amay users 1.2M Oct 31 15:18 smrt1_90k.mas -rw-r--r-- 1 amay users 47M Oct 31 15:18 smrt1_90k.coverage.txt -rw-r--r-- 1 amay users 292K Oct 31 15:18 edges.fwd.backup.txt -rw-r--r-- 1 amay users 282K Oct 31 15:18 edges.bkw.backup.txt -rw-r--r-- 1 amay users 0 Oct 31 15:18 overlap_debug.txt -rw-r--r-- 1 amay users 0 Oct 31 15:18 smrt1_90k.hinge.list -rw-r--r-- 1 amay users 0 Oct 31 15:18 hinge_debug.txt -rw-r--r-- 1 amay users 583K Oct 31 15:18 smrt1_90k.killed.hinges -rw-r--r-- 1 amay users 168K Oct 31 15:18 smrt1_90k.deadends.txt -rw-r--r-- 1 amay users 219K Oct 31 15:18 edges.g_out.txt -rw-r--r-- 1 amay users 201K Oct 31 15:18 smrt1_90k.edges.1 -rw-r--r-- 1 amay users 204K Oct 31 15:18 smrt1_90k.edges.2 -rw-r--r-- 1 amay users 291K Oct 31 15:18 smrt1_90k.edges.hinges -rw-r--r-- 1 amay users 228K Oct 31 15:18 smrt1_90k.edges.hinges2 -rw-r--r-- 1 amay users 291K Oct 31 15:18 smrt1_90k.edges.greedy -rw-r--r-- 1 amay users 360 Oct 31 15:18 smrt1_90k.edges.skipped -rw-r--r-- 1 amay users 338 Oct 31 15:18 smrt1_90k.hgraph -rw-r--r-- 1 amay users 83 Oct 31 15:19 smrt1_90k.debug -rw-r--r-- 1 amay users 1.7M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G00.graphml -rw-r--r-- 1 amay users 1.8M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G0.graphml -rw-r--r-- 1 amay users 1.3M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G1.graphml -rw-r--r-- 1 amay users 44 Oct 31 15:19 tandem.txt -rw-r--r-- 1 amay users 1.3M Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2.graphml -rw-r--r-- 1 amay users 325K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.Gs.graphml -rw-r--r-- 1 amay users 324K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2s.graphml -rw-r--r-- 1 amay users 370K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.Gc.graphml -rw-r--r-- 1 amay users 370K Oct 31 15:19 smrt1_90ksmrt1_90k_run_id.G2c.graphml -rw-r--r-- 1 amay users 60K Oct 31 15:19 smrt1_90k.edges.list -rw-r--r-- 1 amay users 36K Oct 31 15:19 smrt1_90k_draft.graphml -rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.draft.deadends.txt -rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.max -rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.garbage.txt -rw-r--r-- 1 amay users 0 Oct 31 15:19 smrt1_90k.contained.txt drwxr-xr-x 2 amay users 4.0K Oct 31 15:19 log -rw-r--r-- 1 amay users 4.9M Oct 31 15:20 smrt1_90k.draft.fasta -rw-r--r-- 1 amay users 2.5M Oct 31 15:20 smrt1_90k.draft.norevcomp.fasta -rw-r--r-- 1 amay users 29 Oct 31 15:20 draft_map.txt -rw-r--r-- 1 amay users 2.5M Oct 31 15:20 smrt1_90k.draft.norevcomp.pb.fasta -rw-r--r-- 1 amay users 68 Oct 31 15:20 draft.db -rw-r--r-- 1 amay users 3.1M Oct 31 15:21 draft.smrt1_90k.las -rw-r--r-- 1 amay users 2.5M Oct 31 15:21 smrt1_90k.consensus.fasta -rw-r--r-- 1 amay users 541 Oct 31 15:24 smrt1_90k.consensus.fasta.stats

The output folder contents of dev -rw-r--r-- 1 amay users 271M Jan 5 10:22 m54072_160926_234436.subreads_downsampled_90k.fastq.0.fasta -rw-r--r-- 1 amay users 5.4M Jan 5 10:22 fasta_map.txt -rw-r--r-- 1 amay users 274M Jan 5 10:22 m54072_160926_234436.subreads_downsampled_90k.fastq.0_f.fasta -rw-r--r-- 1 amay users 242 Jan 5 10:22 hinge_assembly.db -rw-r--r-- 1 amay users 509M Jan 5 10:25 hinge_assembly.las -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.homologous.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.filtered.fasta -rw-r--r-- 1 amay users 0 Jan 5 10:25 debug.txt -rw-r--r-- 1 amay users 584K Jan 5 10:25 hinge_assembly.repeat.txt -rw-r--r-- 1 amay users 584K Jan 5 10:25 hinge_assembly.hinges.txt -rw-r--r-- 1 amay users 1.2M Jan 5 10:25 hinge_assembly.mas -rw-r--r-- 1 amay users 47M Jan 5 10:25 hinge_assembly.coverage.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.deadends.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 edges.fwd.backup.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 edges.bkw.backup.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.hinges2 -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.hinges -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.2 -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.1 -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.skipped -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.greedy -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.hgraph -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.debug -rw-r--r-- 1 amay users 0 Jan 5 10:25 overlap_debug.txt -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.hinge.list -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_debug.txt -rw-r--r-- 1 amay users 583K Jan 5 10:25 hinge_assembly.killed.hinges -rw-r--r-- 1 amay users 4 Jan 5 10:25 edges.g_out.txt -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G00.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G0.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G1.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.Gs.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2s.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.Gc.graphml -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assemblyhinge_assembly_run_id.G2c.graphml -rw-r--r-- 1 amay users 0 Jan 5 10:25 hinge_assembly.edges.list -rw-r--r-- 1 amay users 308 Jan 5 10:25 hinge_assembly_draft.graphml -rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.garbage.txt -rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.draft.deadends.txt -rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.contained.txt drwxr-xr-x 2 amay users 4.0K Jan 5 10:26 log -rw-r--r-- 1 amay users 2 Jan 5 10:26 hinge_assembly.draft.fasta -rw-r--r-- 1 amay users 0 Jan 5 10:26 hinge_assembly.draft.norevcomp.fasta

govinda-kamath commented 7 years ago

Hi @alimayy,

Can you confirm how long the contig in the master branch is?

alimayy commented 7 years ago

Hi @govinda-kamath,

The master produces a 2.5 MB single contig with the right size.

Here I'm attaching the stdout and stderr of both the master and dev. Note that I didn't use -u (unbuffered stdout and stderr) while running my wrapper so the order of stdout isn't really correct. hinge_master_and_dev_out_err.tar.gz

govinda-kamath commented 7 years ago

Hi @alimayy

It looks like the problem here is that you ran hinge layout without running hinge maximal which fails in the dev branch. You can find an example here where you can enable/disable the --mlas option based on if you have one las file or have split the las file into many.

alimayy commented 7 years ago

Many thanks @govinda-kamath , adding the hinge maximal step did the trick. I'm closing this.