PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

How to run SMRT-analysis? #36

Closed ls2017 closed 8 years ago

ls2017 commented 8 years ago

I am new to PacBio data analysis and I have just finished a falcon fun and want to use FALCON_unzip to construct haplotype specific contigs/Phased Diploid Genome assembly.

I have installed Falcon_unzip and SMRT-analysis on our HPC.

Under the folder FALCON_unzip-master/examples, there are 2 files: unzip.sh and fc_unzip.cfg

The contents of unzip.sh is as the following: fc_unzip.py fc_unzip.cfg fc_quiver.py fc_unzip.cfg

For cfg file, as our HPC doesn't support SGE, I have adapted the cfg file according to your comments on previous issue #21 ( I can provide my feedback for my running if needed)

[General] job_type = local

[Unzip] input_fofn= input.fofn input_bam_fofn= input_bam.fofn

smrt_bin=/home/peter/bioinformatics/smrtlink/install/smrtlink-fromsrc_3.0.5.175021,175083-175021-174993-174993/bundles/smrttools/smrtcmds/bin/

jobqueue = sge_phasing= sge_quiver= sge_track_reads= sge_blasr_aln= sge_hasm= unzip_concurrent_jobs = 6 quiver_concurrent_jobs = 6

My questions are listed as below:

Question 1: Shall I just run the command: unzip.sh fc_unzip.cfg, which will generate the Phased Diploid Genome assembly at the end?

Question 2: How to prepare input.fofn and input_bam.fofn listed in the cfg file? Does the input.fofn contain the path and names of all the *.bax.h5 files?

Question 3: Since falcon-unzip integrates/calls quiver, is there still any need to run SMRT-analysis alone?

Question 4: I am new to SMRT-analysis, too. Do I need to worry about using the following SMRT commands with settings.xml as mentiond below? If so, where and how to get settings.xml? $ $SMRT_ROOT/current/etc/setup.sh $ smrtpipe.py params=settings.xml xml:input.xml

Question 5: If I don't use Falcon-unzip, what commands from SMRT-analysis people normally use to generate the final assembly from Falcon output? Is it just quiver?

Looking forward to your reply

Many thanks!

ls2017 commented 8 years ago

Regarding my question how to prepare input_bam.fofn, my question actually means how to build BAM files?

I found this from the website, which tells 3.0 SMRT is necessary: https://github.com/PacificBiosciences/PacBioFileFormats/wiki/BAM-recipes

When I try to download 3.0 SMRT, it only has preview from the site: http://www.pacb.com/proceedings/pacbio-smrt-analysis-3-0-preview/

When I got to Github, there is only S2.3.0 from https://github.com/PacificBiosciences/SMRT-Analysis

I found all these very confusing.

Could you please explain what software version I actually need to use Falcon output for Falcon_unzip? Say, FalconV0.5 and above, SMRT-analysis 3.0 and the recently released Falcon_unzip?

Then after Falcon, shall I use the original input.fofn listing all the fasta files and the bam files converted by 3.0 SMRT from bax.h5 files to run the unzip.sh and fc_unzip.cfg ?

Could you please advise? I have tried for a couple of weeks and still get nowhere.

Many thanks!

These are the information from https://github.com/PacificBiosciences/PacBioFileFormats/wiki/BAM-recipes This page assumes you have the 3.0 SMRTanalysis stack installed. There is no other supported way to build BAM files according to our requirements. So, for example, if you do not have bax2bam available, there is no other supported way to build a compliant BAM file adhering to our requirements. Building a basecalls BAM file (no alignments) For each run, the PacBio RS gives you three "bax.h5" output files. Use bax2bam to get them into a one BAM file: $ bax2bam m.1.bax.h5 m.2.bax.h5 m.3.bax.h5

peterdfields commented 8 years ago

In order to build the necessary bam files you will need the bax2bam executable that comes with SMRTanalysis 3.0. For a time at least this could be accessed via the ftp site described on the FAQ:

https://github.com/PacificBiosciences/FALCON_unzip/wiki/FAQ

I recently found that changing the requested file names to:

smrtlink_3.0.5.175021.run
smrtsuite_3.0.5.175021.run.md5

worked but the file name issue is irrelevant now as the ftp address is no longer responsive.

ls2017 commented 8 years ago

@peterdfields Thank you very much for pointing to the SMRTanalysis 3.0! Will give it a try.

xuzhichao830 commented 7 years ago

@pb-cdunn How can we get the SMRT-analysis v3.0?

pb-jchin commented 7 years ago

@xuzhichao830 please contact PacBio support team for information on obtaining official software release.