PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

input_bam.fofn contents #71

Closed wyim-pgl closed 6 years ago

wyim-pgl commented 7 years ago

Dear Jason,

Hello!

Can you briefly explain input_bam.fofn contents ?

How can I generate it ?

Thank you.

Won

pb-jchin commented 7 years ago

it is just a list of files to the PacBio BAM files that contains the pulse level signals.

wyim-pgl commented 7 years ago

Then how can I generate it? BAX2BAM from raw file? or from FALCON output?

Thank you

pb-jchin commented 7 years ago

if you only have BAX files. you will need to generate the BAM files from BAX2BAM. the input_bam.fofn is just a text file with the paths to the all BAM files. Just do a UNIX find to get the list.

wyim-pgl commented 7 years ago

Or can I generate from fastq file?

Won

pb-jchin commented 7 years ago

No. Those fastq files do not have enough information.

wsuplantpathology commented 7 years ago

Hi @pb-jchin :

Can you please explain a little bit what is "the PacBio BAM files that contains the pulse level signals" that you mentioned above?

I have another two questions: 1) during I run fc_unzip.py, I find there is a sorted .bam file under ../3-unzip../../blasr/ directory, can we let FALCON-unzip use this BAM to perform consensus calling? If we can, then how?

2) for my own data, I also have 50X Illumina reads, can I map these reads to p_contigs.fasta that is generated from FALCON using like BWA, then use the BAM file (probably after sorting, indexing) to feed FALCON_unzip for consensus calling? Thanks.

pb-jchin commented 7 years ago

@wsuplantpathology please refer to http://pacbiofileformats.readthedocs.io/en/3.0/BAM.html for the BAM format. If you have older data, you might have the information stored in hdf5 file format. In such case, you will need to run bax2bam to convert it to the (unaligned) BAM file with extra tags for PacBio specific information.

FALCON-Unzip does have utility to do haplotype specific consensus as we describe in the publication. It is initiated with fc_quiver.py after the unzip step.

For 2, you can align Illumina read to p_contigs.fa, but you have to be aware BWA does not "know" you genome is a diploid genome. You will have to examine the alignment and probably do some proper filtering for the downstream processes that you like to do.

gconcepcion commented 7 years ago

Hi wsuplantpathology,

  1. No, the sorted bam file to which you are referring (e.g. 3-unzip/0-phasing/000000F/blasr/000000F_sorted.bam) has reads from both haplotypes. If you called consensus on this particular bamfile, it would result in a haploid consensus sequence, effectively nullifying the unzipping you just performed. This bam file is the sorted alignment file from which SNPS are identified and reads are subsequently segregated by phase. You should use fc_quiver.py for the easiest unzipped assembly consensus calling.

  2. FALCON unzip will not take a random bamfile for consensus calling, it has to be the original machine generated raw bamfile. You can definitely use 50x Illumina reads to correct, however be aware of the caveat jchin noted above. Also, you would probably need to refer to pilon or some other short read based polishing tool.

wsuplantpathology commented 7 years ago

Thanks @pb-jchin @gconcepcion for clarification. Probably I'll try bax2bam later.

wyim-pgl commented 7 years ago

@pb-jchin Do we need additional "pulse features (QVs)" such as DeletionQV,DeletionTag,InsertionQV,IPD,MergeQV,SubstitutionQV,PulseWidth,SubstitutionTag ?

Thank you!

pb-jchin commented 7 years ago

bax2bam default should load all necessary values.

wyim-pgl commented 7 years ago

Thank you.

jnarayan81 commented 6 years ago

I wanted to try PacBio + ONT reads to assemble with Falcon+FalconUnzip. But I am worried for input_bam.fofn file. Is there any way to create it ?

gconcepcion commented 6 years ago

Pacbio supports the use case of FALCON with PacBio reads.

For other situations, you're welcome to use the software - but you will be on your own as they are unsupported.

wsuplantpathology commented 5 years ago

Hi wsuplantpathology,

  1. No, the sorted bam file to which you are referring (e.g. 3-unzip/0-phasing/000000F/blasr/000000F_sorted.bam) has reads from both haplotypes. If you called consensus on this particular bamfile, it would result in a haploid consensus sequence, effectively nullifying the unzipping you just performed. This bam file is the sorted alignment file from which SNPS are identified and reads are subsequently segregated by phase. You should use fc_quiver.py for the easiest unzipped assembly consensus calling.
  2. FALCON unzip will not take a random bamfile for consensus calling, it has to be the original machine generated raw bamfile. You can definitely use 50x Illumina reads to correct, however be aware of the caveat jchin noted above. Also, you would probably need to refer to pilon or some other short read based polishing tool.

Dear @gconcepcion

Recently, I got another issue. I was trying to polish my assembly after fc_unzip.py. However, I do not have PacBio .bam file. In this case, can I run fc_quiver.py ?

It seems fc_quiver.py needs special configuration file, but I cannot examples for such config_fn file. Could you please provide some details on this file? Thanks so much.

Sincerely, Chongjing

Here is the help from fc_quiver.py

$ fc_quiver.py -h `falcon-unzip 1.1.4 falcon-kit 1.2.4 pypeflow 2.1.1 usage: fc_quiver.py [-h] [--logging-config-fn LOGGING_CONFIG_FN] config_fn

Run stage 3-unzip and stage 4-polish, given the results of stage 2-asm-falcon.

positional arguments: config_fn Configuration file. (This needs its own help section. Note: smrt_bin is deprecated, but if supplied will be appended to PATH.)

optional arguments: -h, --help show this help message and exit --logging-config-fn LOGGING_CONFIG_FN Optional standard Python logging config (default: None) `