ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
748 stars 135 forks source link

Should I put option "--careful"? #183

Open WeiWei1112 opened 6 years ago

WeiWei1112 commented 6 years ago

Hi, I want to use Spades to assemble a ~40kb plasmid sequenced with Illumina Miseq (2X250nt). When I use the default option: spades.py -1 ../BWA/B13_vector_unmapped.R1.fastq -2 ../BWA/B13_vector_unmapped.R2.fastq --phred-offset 33 -o B13 I get many several small contigs like below:

>NODE_1_length_10836_cov_178.206088
>NODE_2_length_9772_cov_179.468326
>NODE_3_length_3195_cov_183.105606
>NODE_4_length_2559_cov_173.407072
>NODE_5_length_1809_cov_165.414388
>NODE_6_length_1761_cov_177.950428
>NODE_7_length_954_cov_175.383313
>NODE_8_length_789_cov_382.403323
>NODE_9_length_715_cov_332.863946
>NODE_10_length_666_cov_357.298701

But if I add the "--careful" option, I can get >B13_GGCTTAAG-TCGTGACC_1_length_34210_cov_184.621189

Apparently the result is much better with "careful" option. But the question is that do I throw away possible true variants if I just correct mismatches and indels? My sequence contains a gene cluster in which the genes might be similar but slightly different to each other. Do you have any suggestions for the option settings? Thank you so much!

Wei Wei

asl commented 6 years ago

You may want to check the assembly graph to see what is around that contigs and why there were not assembled.

WeiWei1112 commented 5 years ago

You may want to check the assembly graph to see what is around that contigs and why there were not assembled.

Thanks for your reply! Do you know what does the "--careful" option do?

asl commented 5 years ago

Sure. Per SPAdes manual (http://cab.spbu.ru/files/release3.13.0/manual.html):

--careful
    Tries to reduce the number of mismatches and short indels. Also runs MismatchCorrector – a post processing tool, which uses BWA tool (comes with SPAdes). This option is recommended only for assembly of small genomes. We strongly recommend not to use it for large and medium-size eukaryotic genomes
ZhaoruiZhou commented 2 months ago

确定。根据 SPAdes 手册 (http://cab.spbu.ru/files/release3.13.0/manual.html):

--careful
    Tries to reduce the number of mismatches and short indels. Also runs MismatchCorrector – a post processing tool, which uses BWA tool (comes with SPAdes). This option is recommended only for assembly of small genomes. We strongly recommend not to use it for large and medium-size eukaryotic genomes

确定。根据 SPAdes 手册 (http://cab.spbu.ru/files/release3.13.0/manual.html):

--careful
    Tries to reduce the number of mismatches and short indels. Also runs MismatchCorrector – a post processing tool, which uses BWA tool (comes with SPAdes). This option is recommended only for assembly of small genomes. We strongly recommend not to use it for large and medium-size eukaryotic genomes

Hello, if I'm assembling MDA data, is it better to use the --careful option ?