bcgsc / abyss

:microscope: Assemble large genomes using short reads
http://www.bcgsc.ca/platform/bioinfo/software/abyss
Other
310 stars 107 forks source link

abyss-sealer does not change BUSCO score at all #361

Closed ms-gx closed 3 years ago

ms-gx commented 3 years ago

Would you be surprised if abyss-sealer does not change the BUSCO score at all? Why so or why not?

I am talking about actinopterygii_odb10 db which contains 3640 total BUSCOs.

I get the following score BEFORE and AFTER abyss-sealer: C:84.0%[S:65.7%,D:18.3%],F:6.3%,M:9.7%,n:3640

Side-question 1: what do you think about the above BUSCO scores for an Illumina PE150bp-only assembly?

Side-question 2: If I get my best assembly for ABySS with k=104 which k-values would you use for sealer? Why?

lcoombe commented 3 years ago

Would you be surprised if abyss-sealer does not change the BUSCO score at all? Why so or why not?

It really depends on the assembly. Usually, we will see either marginal or no changes in the BUSCO scores. Presumably that is because either the gaps filled don't impact the BUSCO genes, or any gaps present didn't interfere with the ability of BUSCO to find the gene.

Side-question 1: what do you think about the above BUSCO scores for an Illumina PE150bp-only assembly?

The BUSCOs look pretty good to me - the expected BUSCO completeness tends to vary depending on the species assembled (for example, we always see much lower BUSCO scores for various conifer assemblies). But I'd say 84% is pretty good -- it can help to compare the stats with another assembly of the same species or a published assembly of a related species.

Side-question 2: If I get my best assembly for ABySS with k=104 which k-values would you use for sealer? Why?

I tend to use a k-sweep based on the read size, regardless of the ABySS kmer size used. For example, you could try k-values of 75-145 with a step size of 10 for your 2x150bp dataset.

ms-gx commented 3 years ago

Thanks much for your great answers!

ms-gx commented 3 years ago

Ah, related follow-up question: if abyss-sealer is able to fill the gaps, why is ABySS not able to bridge this gap in the first place? Shouldn't the unitigs be fused to contigs at those positions?

lcoombe commented 3 years ago

There are a number of reasons why a gap could be filled in by Sealer, and it was introduced by ABySS. For one, we use various different k-mer sizes with Sealer as compared to the k-mer size used for the ABySS assembly. Also, Sealer can find multiple paths between the left/right flanks of a gap sequence, and finds a consensus of those paths (-P). Sealer uses this bidirectional graph search approach that is not in the abyss-pe assembly pipeline. Most gaps are introduced at the scaffolding stage of ABySS, where many joins are estimated to be between contigs that do not overlap (hence the gap).

ms-gx commented 3 years ago

Thanks much!