PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

Failure: The Quiver algorithm requires a cmp.h5 file #12

Open afinit opened 8 years ago

afinit commented 8 years ago

I'm currently having issues with the Quiver step of FALCON_unzip. Everything seems to run fine up to running fc_quiver.py. At this point it prepares the shell scripts to run each contig group and it looks like these scripts do what they are supposed to until it gets to Quiver. Then I get an error saying Quiver requires a cmp.h5 file. I've provided a shortened example of the command and error below. I also tried running this command by itself from the command line and I get the same error.

$SMRT_CMDS/variantCaller -x 5 -X 120 -q 20 -j 24 \
    -r $PB/4-quiver/000001F_003/000001F_003_ref.fa aln-000001F_003.bam \
    -o $PB/4-quiver/000001F_003/cns-000001F_003.fasta.gz \
    -o $PB/4-quiver/000001F_003/cns-000001F_003.fastq.gz

Failure: The Quiver algorithm requires a cmp.h5 file containing standard (non-CCS) reads.

Software versions:

SMRT-Analysis v3.0   => variantCaller v1.1.0
GenomicConsensus v2.0.0   => variantCaller v2.0.0
pbcore v1.2.7
ConsensusCore v1.0.1

Since I have two versions of variantCaller, I tried them both, but they both result in the same error. I assume this means that I have a dependency that needs to be further updated, but I can't figure out which one.

pb-jchin commented 8 years ago

please double check when calling the quiver command, it is from the SMRT-Analysis v3.0 directory. The error message looks like from Quiver of earlier version. (cmp.h5 is obsoleted. All quiver consensus will be done with bam files.)

afinit commented 8 years ago

I ran it again and called it directly from one of the SMRT-Analysis directories as:

smrtsuite/install/smrtsuite-fromsrc_3.0.2.170012/bundles/smrttools/install/smrttools-fromsrc_3.0.2.170012/smrtcmds/bin/variantCaller

This still caused the same error.

This points to:

smrtsuite/install/smrtsuite-fromsrc_3.0.2.170012/bundles/smrttools/install/smrttools-fromsrc_3.0.2.170012/private/pacbio/pythonpkgs/GenomicConsensus/binwrap/variantCaller

I also opened up the SMRT-Analysis binwrap/python and checked the version of GenomicConsensus provided with it. This gave v1.1.0

afinit commented 8 years ago

Shouldn't the GenomicConsensus v2.0.0 variantCaller be correct as well though?

pb-jchin commented 8 years ago

I don't install SMRTanalysis and GenomicConsensus at the same time. I suspect there is some conflicting. The way SMRT analysis isolates its environment is subtle. When I run FALCON-unzip, my exception path typically does not include the SMRAnalysis or Genomics Consensus path at all.

pb-jchin commented 8 years ago

what does your quiver line inside 4-quiver/*F/cns_*F.sh look like?

this is what mine looks like

/mnt/secondary/builds/full/3.0.0/prod/current-build_smrtanalysis/smrtcmds/bin/variantCaller

Also, please post the results of echo $PATH

afinit commented 8 years ago

This was pulled from the 4-quiver/*F/cns_*F.sh where $PB_RUN is the root of the PacBio run.

($HOME/venv/falcon/bin/smrtcmds/variantCaller -x 5 -X 120 -q 20 -j 24 \
    -r $PB_RUN/4-quiver/000001F_003/000001F_003_ref.fa aln-000001F_003.bam \
    -o $PB_RUN/4-quiver/000001F_003/cns-000001F_003.fasta.gz \
    -o $PB_RUN/4-quiver/000001F_003/cns-000001F_003.fastq.gz) \
    || echo quiver failed

I shortened it a bit and put it on multiple lines for readability. $HOME/venv/falcon/bin/smrtcmds/ contains links to the SMRT-Analysis smrtcmds/bin

mhsieh commented 8 years ago

Try variantCaller --version or quiver --version to get the version, please avoid digging into binwrap as much as possible.

SA3 should be able to recognize .bam file by nature, currently I am not very clear how this is setup.

If possible, you might want to share with us aln-000001F_003.bam and 000001F_003_ref.fa files and I can probably help excluding some factors.

afinit commented 8 years ago

As described above, from the command line I'm running version 2.0.0 of variantCaller and from the smrtsuite cmds I'm running version 1.1.0:

$ variantCaller --version
2.0.0
$ quiver --version
2.0.0

Here are the files that are being used as input for variantCaller. These were both produced by other commands in fc_quiver.py:

000001F_003_ref.fa.zip aln-000001F_003.bam.zip

Here is my $PATH. I can try running variantCaller without the smrtcmds in my $PATH, but I don't know that I see how that might fix things.

$ echo $PATH
$HOME/venv/falcon/bin:$HOME/bin:/usr/local/bin:/opt/local/bin:/opt/local/sbin:/usr/bin:/usr/local/sbin:/usr/sbin:/shares/bioinfo/bin:/shares/bioinfo/installs/trinity:/shares/condor/bin:/usr/lib64/mpich/bin:$HOME/venv/falcon/bin/smrtcmds
afinit commented 8 years ago

I will reinstall FALCON-unzip tomorrow in a new virtualenv and see if that fixes things. Perhaps I've added some things to the $PATH that I'm not aware of.

pb-jchin commented 8 years ago

in my case, the variantCaller is not in the path so I use the full path to call it. The wrapper should take care its own environment. I would suggest you simply the $PATH variable to isolate the environment to see if you can find any conflict. Also, check the generated an-000001F_003.bam with samtools to see if it is good.

mhsieh commented 8 years ago

bam file from @afinit doesn't seem to be compatible with the SA3's variantCaller. I got the same error while it works with @pb-jchin's examples.

~/ghtest$ ls -alG
total 1720
drwxr-xr-x  2 mhsieh    4096 Feb  3 23:06 .
drwxr-xr-x 95 mhsieh   20480 Feb  3 23:01 ..
-rw-r--r--  1 mhsieh   15018 Feb  3 18:55 000001F_003_ref.fa
-rw-r--r--  1 mhsieh      33 Feb  3 23:06 000001F_003_ref.fa.fai
-rw-r--r--  1 mhsieh 1707473 Feb  3 18:55 aln-000001F_003.bam
-rw-r--r--  1 mhsieh    1435 Feb  3 23:05 aln-000001F_003.bam.pbi
~/ghtest$ variantCaller -x 5 -X 120 -q 20 -j 2 -r 000001F_003_ref.fa -o test.fasta.gz -o test.fastq.gz aln-000001F_003.bam 
Failure: The Quiver algorithm requires a cmp.h5 file containing standard (non-CCS) reads.
mhsieh commented 8 years ago

a possible conclusion here is that @afinit 's SA3 tar ball should be okay. Now let's check how these bam files were generated.

afinit commented 8 years ago

I am rerunning from a fresh install. In the meantime, would it be possible for me to have access to a test bam and fa? I could try to compare the bam header and entries to see if there are any glaring differences

pb-jchin commented 8 years ago

Hi, @afinit yes, that is a good idea, I do have plan to build some testing data/ testing runs but it will have to wait until the AGBT meeting next week is over.

afinit commented 8 years ago

I received the same error again. I spent quite a bit of time digging through the code to see where the file is validated. I couldn't seem to find the end of the trail so I finally just commented out the readType check that is raising the error, (this line in quiver.py). I reran the code and the variantCaller line from the shell script runs fine.

I did just see that I didn't have .bam files for all of my raw reads files in input_bam.fofn. I noticed this, because a couple of the shell scripts returned this error: Input CmpH5 file must be nonempty. referring to the aln*bam file used as input to variantCaller. This could explain the error, but I don't know why it would have run without the readType check. When run with the readType check, all of the shell scripts returned errors.

pb-jchin commented 8 years ago

Hi, @afinit, I am still wondering how that is triggered. While I did see it before, I can reproduce it here. Anyway, @hayanlee also encountered something similar. I did an end-to-end check for quiver on my side last Friday with smrtanalysis 3.0.3, I can go through with out problem. Here is my working environment for your reference.

My PATH variable is simple, standard UNIX PATH pre-pend with FALCON executable path:

$ echo $PATH
/home/UNIXHOME/jchin/build/falcon_latest_build/FALCON-integrate/fc_env/bin:/mnt/software/p/parallel/bin:/home/UNIXHOME/jchin/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin 

I converted the *.bax.h5 files to *.bam by calling the bax2bam with full path like this:

bax2bam
/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/bax2bam /pbi/collections/242/2420309/0001/Analysis_Results/m150715_213954_42175_c100867392550000001823195203031660_s1_p0.1.bax.h5 -o m150715_213954_42175_c100867392550000001823195203031660_s1_p0.1 &

Here is one example of the my cns_*.sh inside the 4-quiver directory:

$ cat 4-quiver/000028F/cns_000028F.sh

export PATH=/home/UNIXHOME/jchin/build/falcon_latest_build/FALCON-integrate/fc_env/bin:/mnt/software/p/parallel/bin:/home/UNIXHOME/jchin/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:${PATH}
export PYTHONPATH=:${PYTHONPATH}
export LD_LIBRARY_PATH=:${LD_LIBRARY_PATH}
set -vex
trap 'touch /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_quiver_done.exit' EXIT
cd /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/./4-quiver/000028F
hostname
date
cd /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/./4-quiver/000028F
/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/samtools faidx /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_ref.fa
/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/samtools view -b -S /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/reads/000028F.sam > 000028F.bam
/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/pbalign --tmpDir=/localdisk/scratch/ --nproc=24 --minAccuracy=0.75 --minLength=50            --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr            --algorithmOptions=-useQuality --maxHits=1 --hitPolicy=random --seed=1            000028F.bam /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_ref.fa aln-000028F.bam
#/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/makePbi --referenceFasta /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_ref.fa aln-000028F.bam
(/mnt/secondary/builds/full/3.0.3/prod/smrtanalysis_3.0.3.172135/smrtcmds/bin/variantCaller -x 5 -X 120 -q 20 -j 24 -r /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_ref.fa aln-000028F.bam            -o /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/cns-000028F.fasta.gz -o /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/cns-000028F.fastq.gz) || echo quvier failed
date
touch /lustre/hpcprod/jchin/JGI_fungal_diploid_0.4+/4-quiver/000028F/000028F_quiver_done
BenjaminSchwessinger commented 7 years ago

I could reproduce the same issue when using bam files converted from bax.h5 files generated by an RSII machine in the latest FALCON unzip version [Latest commit 7ebc99c on Dec 22, 2016]. This was using smrtlink_4.0.0.190159 and arrow as correction.

I commented out the following lines in the arrow script found in smrtlink_4.0.0.190159/install/smrtlink-fromsrc_4.0.0.190159+190159-190159-189856-189856-189856/bundles/smrttools/install/smrttools-fromsrc_4.0.0.190159/private/pacbio/pythonpkgs/GenomicConsensus/lib/python2.7/site-packages/GenomicConsensus/arrow/arrow.py

253 #if alnFile.readType != "standard": 254 # raise U.IncompatibleDataException( 255 # "The Arrow algorithm requires a BAM file containing standard (non-CCS) reads." )

Worked just fine afterwards.

yingzhang121 commented 7 years ago

This issue seems persist in the latest release of smrtlink and falcon. I downloaded falcon from https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries, and I tried both SMRTLINK v 3.1.1 and v 4.0.0

This is the command to generate the bam file: /home/support/zhan2142/smrtlink400/smrtcmds/bin/samtools faidx /panfs/roc/scratch/zhan2142/falcon_test/4-quiver/quiver_scatter/000197F/000197F_ref.fa

and this is the failure: /home/support/zhan2142/smrtlink400/smrtcmds/bin/variantCaller --algorithm=arrow -x 5 -X 120 -q 20 -j 24 -r /panfs/roc/scratch/zhan2142/falcon_test/4-quiver/quiver_scatter/000197F/000197F_ref.fa aln-000197F.bam -o /panfs/roc/scratch/zhan2142/falcon_test/4-quiver/000197F/cns-000197F.fasta.gz -o /panfs/roc/scratch/zhan2142/falcon_test/4-quiver/000197F/cns-000197F.fastq.gz || echo quvier failed Failure: The Arrow algorithm requires a BAM file containing standard (non-CCS) reads. quvier failed

However, previously when I used a test version of smrtlink_3.1.1.182868.zip (by courtesy of Laura Nolden), I didn't have the issue. (Actually, I had another non-related issue with quiver and bam, but manually fixed it.)

Anyway, I commented out the three lines as people mentioned above, and issue resolved.