PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

error in tracking subreads and preads? #545

Open sjin09 opened 7 years ago

sjin09 commented 7 years ago

To whom it may concern,

I have been successful in completing both FALCON and FALCON_UNZIP. Thereafter, I wanted to apply pilon to the cns_p_ctg.fa and cns_h_ctg.fa using the preads, which have been further error-corrected using lordec and hiseq paired-end reads. I thought this would allow for better haplotype-specific correction rather than just correcting the contigs with hiseq using pilon.

I was able to find 4-quiver/read_maps/read_to_contig_map that has the relationship between subreads,preads and contigs. In addition, in the 4-quiver directory, the contig alignment is partitioned into each folder designated with the contig name. I was able to extract the subreads used for arrow from the bam files in this directory.

In the case of contig 000034F, there were 69 subreads aligned to the contig to perform arrow. However, In the read_to_contig_map, I was only able to find 5 subreads with the corresponding 5 preads.

I was wondering if it is possible to find all the relationship between subreads and preads, and what could possibly explain the discrepancies in the read_to_contig_map and the actual number of subreads used for arrow.

I have attached below the read_to_contig_map for 000034F.

005138344 017569666 m141229_192421_sherri_c100759702550000001823166007221583_s1_p0/111512/4456_42907 000034F
000515508 001790263 m141226_203727_42242_c100752732550000001823158607081592_s1_p0/27188/0_29645 000034F
002837059 009753314 m150102_223743_42142_c100759962550000001823166007221560_s1_p0/49191/0_21146 000034F
004977481 017013970 m150421_103407_42242_c100778262550000001823160408051565_s1_p0/53325/0_25888 000034F
000403435 001391167 m150419_153457_42242_c100778282550000001823160408051547_s1_p0/113042/0_37037 000034F
lijingjing1 commented 7 years ago

It is a good idea to use the more "accurate " reads to next quiver/arrow .It will improved compute effciency.