PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

contig names usually end with "F" or "R". What about the others? #442

Open dgordon562 opened 8 years ago

dgordon562 commented 8 years ago

What is the significance of the contigs whose names do NOT end in F or R?

pb-jchin commented 8 years ago
The file ctg_paths encodes the graph for each contig after the unitigs are analyzed and put into 
contigs. Each line has 7 columns. The first column is the contig ID. The contig ID are just the serial
 numbers followed by R or F. Two contigs with same serial number but different endings are "dual" to
 each other. Namely, they are constructed from "dual" edges and they are mostly reverse
 complemented to each other except near the ends of the contigs. The second column is the type of
 contig. If a unitig is circular (the beginning node and the ending node are the same), then it will be
 marked as "ctg_circular". Everything else will be "ctg_linear". In some case, even a contig is marked
 as "ctg_linear", it can be still a circular contig if the beginning node and the ending node are the
 same but it is not a "simple" path. One can detect that by checking the beginning and ending
 nodes if necessary.

cf. https://github.com/PacificBiosciences/FALCON/wiki/Manual

pb-cdunn commented 8 years ago

Does that answer the question? I understand F and R, but not their absence. Which file has contig names without F/R?

dgordon562 commented 8 years ago

Hi, Chris,

Sorry for the long delay. I'm guessing (although I admit it isn't clear) that contigs without F or R are considered circular by falcon. I see these in the names of contigs in 2-asm-falcon/p_ctg.fa What do you think?

David