Closed bjclavijo closed 5 years ago
This is indeed the intended behaviour. -3.fa and -4.fa together have the sequences for the -4.gfa file. It's done this way to reduce disk space. The files 1, 3, 6, 8 and are complete, and we refer to them as pre-unitigs, unitigs, contigs, and scaffolds. Although in retrospect, it should have been unitigs, bubble-popped, contigs, and scaffolds.
Is this clarified anywhere? I honestly looked for it and couldn't find it.
Best,
bj
On Mon, 29 Jul 2019, 21:53 Shaun Jackman, notifications@github.com wrote:
Closed #291 https://github.com/bcgsc/abyss/issues/291.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcgsc/abyss/issues/291?email_source=notifications&email_token=AAFQQDOUWTLVEEXYBI55TKTQB5KDFA5CNFSM4IHRE4D2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSYJF6PQ#event-2517786430, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFQQDKIG2CI5JJLLWQP2NDQB5KDFANCNFSM4IHRE4DQ .
No, I don't believe so. These numbered files are considered internal temporary files. The name-unitigs.fa
, name-contigs.fa
, and name-scaffolds.fa
symlinks are intended to be exposed to the user.
When running a trivial example (e.coli dataset), the GFA output mentions sequences that are not defined on the .fa files corresponding to that graph for the graphs 4, 5, and 7. I am not sure if this is expected behaviour as these are intermediate steps, but it defeats the point of having the output on GFA. This is consistent when running with GFA and GFA2 outputs, as well as .dot. It seems to be just a matter of the dump to the .fa files.
This was tested on abyss-pe (ABySS) 2.1.5, on a Mac, installed via brew.