MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

Error running mapDamage in example bam_pipeline #30

Closed Rhinogradentia closed 4 years ago

Rhinogradentia commented 4 years ago

Hi, I'm just trying to install the paleomix pipeline inside a singularity container to be able to run it on an HPC. The example pipeline runs almost completely successful, but 2 errors occur:

[root@singularity-builder]/vagrant/Paleomix/bam_pipeline# singularity run ../paleomix.img/ bam_pipeline run 000_makefile.yaml 
Reading makefiles ...
  - Validating prefixes ...
Building BAM pipeline .
Running BAM pipeline ...
  - Checking file dependencies ...
  - Checking for required executables ...
  - Checking version requirements ...
    - Checking version of 'Rscript' ...
    - Checking version of 'AdapterRemoval' ...
    - Checking version of 'GenomeAnalysisTK' ...
    - Checking version of 'Picard tools' ...
    - Checking version of 'R module: Rcpp' ...
    - Checking version of 'R module: RcppGSL' ...
    - Checking version of 'R module: gam' ...
    - Checking version of 'R module: ggplot2' ...
    - Checking version of 'R module: inline' ...
    - Checking version of 'bwa' ...
    - Checking version of 'mapDamage' ...
    - Checking version of 'samtools' ...
  - Determining states ...
  - Ready ...

08:04:18 Running 2 tasks using ~2 of max 2 threads; 146 done of 166 tasks in 0s; press 'h' for help.
  - <mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/ACGATA'>
  - <mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/TGCTCA'>

<mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/TGCTCA'>
  Error ('NodeError') occurred running command:
    Error(s) running Node:
        Temporary directory: '/tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4'

    Parallel processes:
      Process 1:
        Command = java -server -Djava.io.tmpdir=/tmp/root/bam_pipeline -Djava.awt.headless=true \
                      -XX:+UseSerialGC -Xmx4g -jar /root/install/jar_root/picard.jar MergeSamFiles \
                      SO=coordinate COMPRESSION_LEVEL=0 OUTPUT=input.bam \
                      VALIDATION_STRINGENCY=LENIENT \
                      I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/TGCTCA.rmdup.collapsed.bam \
                      I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/TGCTCA.rmdup.normal.bam
        Status  = Exited with return-code 1
        STDOUT* = 'pipe_java_140259777747728.stdout'
        STDERR* = 'pipe_java_140259777747728.stderr'
        CWD     = '/tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4'

      Process 2:
        Command = mapDamage --no-stats --merge-reference-sequences -t \
                      'mapDamage plot for library '"'"'TGCTCA'"'"'' -i \
                      /tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4/input.bam -d \
                      /tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4 -r \
                      000_prefixes/rCRS.fasta --downsample 100000
        Status  = Exited with return-code 1
        STDOUT* = '/tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4/pipe_mapDamage.stdout'
        STDERR* = '/tmp/root/bam_pipeline/2b3eef21-e42a-4a9a-8ef6-5c294f91eca4/pipe_mapDamage.stderr'
        CWD     = '/vagrant/Paleomix/bam_pipeline'

08:04:29 Running 1 task using ~1 of max 2 threads; 14 failed, 146 done of 166 tasks in 12s; press 'h' for help.
  Log-file located at '/tmp/root/bam_pipeline/bam_pipeline.20200529_080403_00.log'
  - <mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/ACGATA'>

<mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/ACGATA'>
  Error ('NodeError') occurred running command:
    Error(s) running Node:
        Temporary directory: '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb'

    Parallel processes:
      Process 1:
        Command = java -server -Djava.io.tmpdir=/tmp/root/bam_pipeline -Djava.awt.headless=true \
                      -XX:+UseSerialGC -Xmx4g -jar /root/install/jar_root/picard.jar MergeSamFiles \
                      SO=coordinate COMPRESSION_LEVEL=0 OUTPUT=input.bam \
                      VALIDATION_STRINGENCY=LENIENT \
                      I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.normal.bam \
                      I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.collapsed.bam
        Status  = Exited with return-code 1
        STDOUT* = 'pipe_java_140259776994960.stdout'
        STDERR* = 'pipe_java_140259776994960.stderr'
        CWD     = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb'

      Process 2:
        Command = mapDamage --no-stats --merge-reference-sequences -t \
                      'mapDamage plot for library '"'"'ACGATA'"'"'' -i \
                      /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/input.bam -d \
                      /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb -r \
                      000_prefixes/rCRS.fasta --downsample 100000
        Status  = Exited with return-code 1
        STDOUT* = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_mapDamage.stdout'
        STDERR* = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_mapDamage.stderr'
        CWD     = '/vagrant/Paleomix/bam_pipeline'

Done; but errors were detected ...

  Number of nodes:             166
  Number of done nodes:        146
  Number of runable nodes:     0
  Number of queued nodes:      0
  Number of outdated nodes:    0
  Number of failed nodes:      20
  Pipeline runtime:            12s

The content of the logfiles is:

 # /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_mapDamage.stderr

started with the command: /usr/bin/mapDamage --no-stats --merge-reference-sequences -t mapDamage plot for library 'ACGATA' -i /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/input.bam -d /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb -r 000_prefixes/rCRS.fasta --downsample 100000
[E::idx_find_and_load] Could not retrieve index file for '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/input.bam'
alignment file must be single-ended
alignment file must be single-ended

and

# /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_java_140259776994960.stderr

INFO    2020-05-29 08:04:21     MergeSamFiles

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MergeSamFiles -SO coordinate -COMPRESSION_LEVEL 0 -OUTPUT input.bam -VALIDATION_STRINGENCY LENIENT -I /vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.normal.bam -I /vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.collapsed.bam
**********

08:04:23.061 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/root/install/jar_root/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri May 29 08:04:23 GMT 2020] MergeSamFiles INPUT=[/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.normal.bam, /vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.collapsed.bam] OUTPUT=input.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=0    ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=false USE_THREADING=false VERBOSITY=INFO QUIET=false MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri May 29 08:04:23 GMT 2020] Executing as root@singularity-builder on Linux 3.10.0-1062.9.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1ubuntu1-b09; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.4
INFO    2020-05-29 08:04:23     MergeSamFiles   Input files are in same order as output so sorting to temp directory is not needed.
[Fri May 29 08:04:29 GMT 2020] picard.sam.MergeSamFiles done. Elapsed time: 0.11 minutes.
Runtime.totalMemory()=12455936
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
        at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222)
        at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:188)
        at htsjdk.samtools.util.BinaryCodec.writeShort(BinaryCodec.java:266)
        at htsjdk.samtools.util.BlockCompressedOutputStream.writeGzipBlock(BlockCompressedOutputStream.java:445)
        at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:415)
        at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:305)
        at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
        at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:188)
        at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:234)
        at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:160)
        at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:144)
        at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:185)
        at picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:224)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
        at java.nio.channels.Channels.writeFully(Channels.java:101)
        at java.nio.channels.Channels.access$000(Channels.java:61)
        at java.nio.channels.Channels$1.write(Channels.java:174)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
        ... 15 more
# Runtime_log.txt

2020-05-29 08:04:29,032 INFO    main: Started with the command: /usr/bin/mapDamage --no-stats --merge-reference-sequences -t mapDamage plot for library 'ACGATA' -i /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/input.bam -d /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb -r 000_prefixes/rCRS.fasta --downsample 100000
2020-05-29 08:04:29,287 ERROR   main: alignment file must be single-ended
# pipe.errors

Command          = '/usr/local/bin/paleomix bam_pipeline run 000_makefile.yaml'
CWD              = '/vagrant/Paleomix/bam_pipeline'
PATH             = '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin'
Node             = <mapDamage (plots): 2 files in 'ExampleProject/rCRS/Synthetic_Sample_1' -> 'ExampleProject.rCRS.mapDamage/ACGATA'>
Threads          = 1
Input files      = 000_prefixes/rCRS.fasta
                   ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.collapsed.bam
                   ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.normal.bam
Output files     = ExampleProject.rCRS.mapDamage/ACGATA/3pGtoA_freq.txt
                   ExampleProject.rCRS.mapDamage/ACGATA/5pCtoT_freq.txt
                   ExampleProject.rCRS.mapDamage/ACGATA/Fragmisincorporation_plot.pdf
                   ExampleProject.rCRS.mapDamage/ACGATA/Length_plot.pdf
                   ExampleProject.rCRS.mapDamage/ACGATA/Runtime_log.txt
                   ExampleProject.rCRS.mapDamage/ACGATA/dnacomp.txt
                   ExampleProject.rCRS.mapDamage/ACGATA/lgdistribution.txt
                   ExampleProject.rCRS.mapDamage/ACGATA/misincorporation.txt
Auxiliary files  = /root/install/jar_root/picard.jar
Executables      = java
                   mapDamage

Errors =
Parallel processes:
  Process 1:
    Command = java -server -Djava.io.tmpdir=/tmp/root/bam_pipeline -Djava.awt.headless=true \
                  -XX:+UseSerialGC -Xmx4g -jar /root/install/jar_root/picard.jar MergeSamFiles \
                  SO=coordinate COMPRESSION_LEVEL=0 OUTPUT=input.bam \
                  VALIDATION_STRINGENCY=LENIENT \
                  I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.normal.bam \
                  I=/vagrant/Paleomix/bam_pipeline/ExampleProject/rCRS/Synthetic_Sample_1/ACGATA.rmdup.collapsed.bam
    Status  = Exited with return-code 1
    STDOUT* = 'pipe_java_140259776994960.stdout'
    STDERR* = 'pipe_java_140259776994960.stderr'
    CWD     = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb'

  Process 2:
    Command = mapDamage --no-stats --merge-reference-sequences -t \
                  'mapDamage plot for library '"'"'ACGATA'"'"'' -i \
                  /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/input.bam -d \
                  /tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb -r \
                  000_prefixes/rCRS.fasta --downsample 100000
    Status  = Exited with return-code 1
    STDOUT* = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_mapDamage.stdout'
    STDERR* = '/tmp/root/bam_pipeline/5a5a5535-df57-4bc3-92d6-8fa2bd6bb0eb/pipe_mapDamage.stderr'
    CWD     = '/vagrant/Paleomix/bam_pipeline'

I'm using following version:

[root@singularity-builder]/vagrant/Paleomix/bam_pipeline# singularity run ../paleomix.img/                                   
PALEOMIX - pipelines and tools for NGS data analyses.
Version: 1.2.14

And I read in your release notes that a similar thing was fixed in a minor version (1.2.6):

mapDamage plots should not require indexed BAMs; this fixed missing file errors for some makefile configurations.

What could be wrong? Thanks in advance, Best, Nadine

MikkelSchubert commented 4 years ago

Hi Nadine,

The problem you are experiencing is caused by a change that was introduced into mapDamage 2.2.0, causing it to abort if a BAM file contained paired-end reads, since such reads are not perfectly supported by mapDamage.

I'm currently working on improving support for PE reads in mapDamage, and in the mean time I've published a small update to mapDamage that reverts this change (mapDamage v2.2.1). If you install that version of mapDamage, then you should be able to run the examples.

Best regards, Mikkel

Rhinogradentia commented 4 years ago

Hi Mikkel,

thanks a lot!

Best, Nadine