Open dgordon562 opened 8 years ago
The file ctg_paths encodes the graph for each contig after the unitigs are analyzed and put into
contigs. Each line has 7 columns. The first column is the contig ID. The contig ID are just the serial
numbers followed by R or F. Two contigs with same serial number but different endings are "dual" to
each other. Namely, they are constructed from "dual" edges and they are mostly reverse
complemented to each other except near the ends of the contigs. The second column is the type of
contig. If a unitig is circular (the beginning node and the ending node are the same), then it will be
marked as "ctg_circular". Everything else will be "ctg_linear". In some case, even a contig is marked
as "ctg_linear", it can be still a circular contig if the beginning node and the ending node are the
same but it is not a "simple" path. One can detect that by checking the beginning and ending
nodes if necessary.
cf. https://github.com/PacificBiosciences/FALCON/wiki/Manual
Does that answer the question? I understand F and R, but not their absence. Which file has contig names without F/R?
Hi, Chris,
Sorry for the long delay. I'm guessing (although I admit it isn't clear) that contigs without F or R are considered circular by falcon. I see these in the names of contigs in 2-asm-falcon/p_ctg.fa What do you think?
David
What is the significance of the contigs whose names do NOT end in F or R?