PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

Why the outputs of falcon_unzip is shorter than the falcon's outputs ? #86

Open myshu2017-03-14 opened 7 years ago

myshu2017-03-14 commented 7 years ago

Hi,

Recently, I have run the FALCON and FALCON_Unzip with the below command. And there were no errors.

fc_run.py fc_run.cfg fc_unzip.py fc_unzip.cfg fc_quiver.py fc_unzip.cfg

And the fc_unzip.cfg contains the input_bam.fofn

[Unzip] input_fofn= input.fofn input_bam_fofn= input_bam.fofn

I found the total length of FALCON results (2-asm-falcon/p_ctg.fa) is 4089287 bp, total contigs number is 326. But the falcon_unzip (3-unzip/all_p_ctg.fa) is 2772359 bp, 59 contigs.....What's more the consensus sequences (4-quiver/cns_output/cns_p_ctg.fasta) is 2779343 bp ,59 contigs...

My question is that why the total length and number of FALCON_Unzip's contigs are shorter than the FALCON's? In fact, I want to get two phased sequences of about 4M. And I also don't know why the the assembled sequences include many short fragments (~dozens or hundreds bp)?

Can you help me ? Looking for you reply. Thank you very much!

Best wishes, Mingyue Shu