baoe / AlignGraph

Algorithm for secondary de novo genome assembly guided by closely related references
166 stars 23 forks source link

Extended Output #31

Open alex-b-chase opened 7 years ago

alex-b-chase commented 7 years ago

Hi,

Thanks for developing this. I had a question about the output files and, after reading the README, I am still unsure on the expected output.

I am using PE150 MiSeq reads. Performed a SPAdes assembly (this output is *.scaffolds.fasta). These genomes are highly fragmented but >99% ANI to a reference genome that I have previously sequenced fully with PACBIO (complete genome; 3.77Mb). However, when I use AlignGraph, the output files confuse me.

One example:

Sequence_ID Total_Contigs   Genome_length   Largest_Contig  n50 GC_Percent
Desert-2-3.extended 16  3335782 1108581 343422  70
Desert-2-3.remain   19  707214  201094  182646  71
Desert-2-3.scaffolds    73  3740981 508547  119510  71

This appears to me that I would need to concatenate the extended.fasta with the remaining.fasta file to get the desired genome? Any clarification would be great.

Here is the command I am using:

AlignGraph --read1 $OUTDIR/${mate}_R1_001.fasta --read2 $OUTDIR/${mate}_R2_001.fasta \
--fastMap --contig $genome.scaffolds.fasta --genome $REFGENOME \
--distanceLow 550 --distanceHigh 1550 \
--extendedContig $genome.extended.fa --remainingContig $genome.remain.fa 

Thank you!

baoe commented 6 years ago

Hi,

Sorry for reply late. Yes, the extended.fasta file contains extended contigs by AlignGraph, and the remaining.fasta file contains the not extended contigs.

Bao

chirrie commented 5 years ago

So does it mean to get final assembly once has to combine sequences in remain and extended contig files?