EichlerLab / smrtsv2

Structural variant caller
MIT License
54 stars 6 forks source link

Align step fails #6

Closed RSherman15 closed 5 years ago

RSherman15 commented 5 years ago

Perhaps follows from issue #5 if I messed something up by commenting out ref_ctab='reference/ref.fasta.ctab', but I commented out the line, align seemed to begin to run, but then yielded an error shortly after:

Job counts:
        count   jobs
        1       aln_align_batch
        1
/software/centos7/bin/bash: line 1: 114301 Aborted                 blasr align/batches/0.fofn reference/ref.fasta --unaligned align/bam/unaligned/0.fa --out ${ALIGN_BATCH_TEMP}/batch_out.bam --sam --sa reference/ref.fasta.sa --nproc 8 --clipping subread --bestn 2 --maxAnchorsPerPosition 100 --advanceExactMatches 10 --affineAlign --affineOpen 100 --affineExtend 0 --insertion 5 --deletion 5 --extend --maxExtendDropoff 50 >> align/bam/log/0.log 2>&1
[Tue Feb  5 23:51:13 2019]
Error in rule aln_align_batch:
    jobid: 0
    output: align/bam/0.bam, align/bam/0.bam.bai, align/bam/unaligned/0.fa.gz
    log: align/bam/log/0.log

RuleException:
CalledProcessError in line 80 of /home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/align.snakefile:
Command ' set -euo pipefail;  ALIGN_BATCH_TEMP=/tmp/aln_align_batch_0; mkdir -p ${ALIGN_BATCH_TEMP}; echo "Aligning batch 0..." >align/bam/log/0.log; blasr align/batches/0.fofn reference/ref.fasta --unaligned align/bam/unaligned/0.fa --out ${ALIGN_BATCH_TEMP}/batch_out.bam --sam --sa reference/ref.fasta.sa --nproc 8 --clipping subread --bestn 2 --maxAnchorsPerPosition 100 --advanceExactMatches 10 --affineAlign --affineOpen 100 --affineExtend 0 --insertion 5 --deletion 5 --extend --maxExtendDropoff 50 >>align/bam/log/0.log 2>&1; echo "Sorting..." >>align/bam/log/0.log; samtools sort -@ 1 -m 4G -O bam -T ${ALIGN_BATCH_TEMP}/0 -o align/bam/0.bam ${ALIGN_BATCH_TEMP}/batch_out.bam >>align/bam/log/0.log 2>&1; echo "Indexing..." >>align/bam/log/0.log 2>&1; samtools index align/bam/0.bam >>align/bam/log/0.log 2>&1; echo "Compressing unaligned reads..." >>align/bam/log/0.log 2>&1; gzip align/bam/unaligned/0.fa >>align/bam/log/0.log 2>&1; echo "Cleaning temp \"${ALIGN_BATCH_TEMP}\"..." >>align/bam/log/0.log; rm -rf ${ALIGN_BATCH_TEMP}echo "Done aligning batch 0" >>align/bam/log/0.log; ' returned non-zero exit status 134.
  File "/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/align.snakefile", line 80, in __rule_aln_align_batch
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work-zfs/mschatz1/rsherman/smrtSVRuns/.snakemake/log/2019-02-05T234829.822989.snakemake.log
Failed to align reads
paudano commented 5 years ago

BLASR often calls abrt() instead of printing a useful error message before dying. That's why you see "114301 Aborted" in the output. So, it could be crashing for any reason.

What kind of data are you aligning? Is this PacBio Sequel or RS II? Are they in PacBio subread BAM files, or something else?

RSherman15 commented 5 years ago

Ah, I just re-read the documentation; I didn't realize I needed the raw subread bams or bax.h5 files. I'm downloading the data in the proper format and hopefully this will solve the problem. Thanks.

RSherman15 commented 5 years ago

I am still unable to align, with the following error:

rule aln_align_batch:
    input: align/batches/15.fofn, reference/ref.fasta, reference/ref.fasta.fai, reference/ref.fasta.sa
    output: align/bam/15.bam, align/bam/15.bam.bai, align/bam/unaligned/15.fa.gz
    log: align/bam/log/15.log
    jobid: 7
    wildcards: batch_id=15

Job counts:
        count   jobs
        1       aln_align_batch
        1
[Sun Feb 10 15:48:23 2019]
Error in rule aln_align_batch:
    jobid: 0
    output: align/bam/15.bam, align/bam/15.bam.bai, align/bam/unaligned/15.fa.gz
    log: align/bam/log/15.log

RuleException:
CalledProcessError in line 80 of /home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/align.snakefile:
Command ' set -euo pipefail;  ALIGN_BATCH_TEMP=/tmp/aln_align_batch_15; mkdir -p ${ALIGN_BATCH_TEMP}; echo "Aligning batch 15..." >align/bam/log/15.log; blasr align/batches/15.fofn reference/ref.fasta --unaligned align/bam/unaligned/15.fa --out ${ALIGN_BATCH_TEMP}/batch_out.bam --sam --sa reference/ref.fasta.sa --nproc 8 --clipping subread --bestn 2 --maxAnchorsPerPosition 100 --advanceExactMatches 10 --affineAlign --affineOpen 100 --affineExtend 0 --insertion 5 --deletion 5 --extend --maxExtendDropoff 50 >>align/bam/log/15.log 2>&1; echo "Sorting..." >>align/bam/log/15.log; samtools sort -@ 1 -m 4G -O bam -T ${ALIGN_BATCH_TEMP}/15 -o align/bam/15.bam ${ALIGN_BATCH_TEMP}/batch_out.bam >>align/bam/log/15.log 2>&1; echo "Indexing..." >>align/bam/log/15.log 2>&1; samtools index align/bam/15.bam >>align/bam/log/15.log 2>&1; echo "Compressing unaligned reads..." >>align/bam/log/15.log 2>&1; gzip align/bam/unaligned/15.fa >>align/bam/log/15.log 2>&1; echo "Cleaning temp \"${ALIGN_BATCH_TEMP}\"..." >>align/bam/log/15.log; rm -rf ${ALIGN_BATCH_TEMP}echo "Done aligning batch 15" >>align/bam/log/15.log; ' returned non-zero exit status 127.
  File "/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/align.snakefile", line 80, in __rule_aln_align_batch
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work-zfs/mschatz1/rsherman/smrtSVRuns/.snakemake/log/2019-02-10T154607.854314.snakemake.log
Failed to align reads

I am using bax.h5 reads for HG002 Genome in a Bottle data (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/hdf5/sequence.index.HG002_PacBio_MtSinai_NIST_hdf5_10102018). Command options used were: smrtsv.py run --threads 8 --species human --sample HG002 [ref] [bax.h5 reads fofn]

The one file in align/bam/log contains the following error message:

Aligning batch 15...
[INFO] 2019-02-10T15:46:22 [blasr] started.
blasr: symbol lookup error: /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/pacbio/bin/../lib/libblasr.so.5.3.1: undefined symbol: _ZNK2H510H5Location15getObjnameByIdxEy
RSherman15 commented 5 years ago

As per this issue, https://github.com/PacificBiosciences/pbbioconda/issues/11, downgrading hdf5 in the pacbio conda environment to 1.10.3 to 1.10.2 appears to have solved the issue (though it hasn't run to completion yet, so we'll see). Leaving this open though since it seems to be an issue with the dependency build.

paudano commented 5 years ago

Thanks. I made a change to the build, and I'll be testing this soon.

RSherman15 commented 5 years ago

Sounds good -- it did not run to completion, I now get the same CalledProcess error with the following error in align/bam/log/15.log, though it seems to have run for quite a bit longer than with the previous error:

Aligning batch 15...
[INFO] 2019-02-14T16:42:40 [blasr] started.
terminate called after throwing an instance of 'std::runtime_error'
  what():  could not write record
paudano commented 5 years ago

I think I have seen that error if the file system fills up. Is there space left on the destination device?

RSherman15 commented 5 years ago

Yes, this is a very large server with more than ample space and RAM.

paudano commented 5 years ago

It may be using a temporary directory on a partition that does not have enough space. At the top of the log file (above "Aligning batch 15..."), it will say ALIGN_BATCH_TEMP=.... Check that location and make sure it's on a partition that has space.

RSherman15 commented 5 years ago

Thanks. Just fyi, it doesn't actually contain this info in the log file -- the first line is "Aligning batch 15". It does contain it in the error message though (in the CalledProcessError). This may be the problem, though, so thank you. I've reset my TEMP_DIR environment variable,and am re-running. It might be nice to add a user option to set the temporary directory though or at least mention it in documentation that it will write to /tmp. I never considered that this might be writing massive files to /tmp, since I specify an output directory.

RSherman15 commented 5 years ago

Ah, I see now from issue #9 this is an argument of smrtsv, just not of "smrtsv run" which is how I missed it before (having just done smrtsv run -h to see options).

paudano commented 5 years ago

--tempdir is useful for pushing IO intensive processes to fast temporary storage and for keeping unecessary IO off of distributed filesystems (these temp files are never shared among rules). I updated the documentation to better describe global and per-step command-line options, and I added some information on temporary directories.