bcgsc / abyss

:microscope: Assemble large genomes using short reads
http://www.bcgsc.ca/platform/bioinfo/software/abyss
Other
310 stars 107 forks source link

abyss -4.fa, -5.fa, -7.fa files do not include all sequences mentioned in their graph counterparts. #291

Closed bjclavijo closed 5 years ago

bjclavijo commented 5 years ago

When running a trivial example (e.coli dataset), the GFA output mentions sequences that are not defined on the .fa files corresponding to that graph for the graphs 4, 5, and 7. I am not sure if this is expected behaviour as these are intermediate steps, but it defeats the point of having the output on GFA. This is consistent when running with GFA and GFA2 outputs, as well as .dot. It seems to be just a matter of the dump to the .fa files.

This was tested on abyss-pe (ABySS) 2.1.5, on a Mac, installed via brew.

sjackman commented 5 years ago

This is indeed the intended behaviour. -3.fa and -4.fa together have the sequences for the -4.gfa file. It's done this way to reduce disk space. The files 1, 3, 6, 8 and are complete, and we refer to them as pre-unitigs, unitigs, contigs, and scaffolds. Although in retrospect, it should have been unitigs, bubble-popped, contigs, and scaffolds.

bjclavijo commented 5 years ago

Is this clarified anywhere? I honestly looked for it and couldn't find it.

Best,

bj

On Mon, 29 Jul 2019, 21:53 Shaun Jackman, notifications@github.com wrote:

Closed #291 https://github.com/bcgsc/abyss/issues/291.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcgsc/abyss/issues/291?email_source=notifications&email_token=AAFQQDOUWTLVEEXYBI55TKTQB5KDFA5CNFSM4IHRE4D2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSYJF6PQ#event-2517786430, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFQQDKIG2CI5JJLLWQP2NDQB5KDFANCNFSM4IHRE4DQ .

sjackman commented 5 years ago

No, I don't believe so. These numbered files are considered internal temporary files. The name-unitigs.fa, name-contigs.fa, and name-scaffolds.fa symlinks are intended to be exposed to the user.