AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
127 stars 55 forks source link

Using CRAM files with VarDictJava #364

Closed nvnieuwk closed 2 years ago

nvnieuwk commented 2 years ago

Hello, I'm using CRAM files in my analysis pipeline and am trying to get VarDictJava to work with these. As mentioned in #249, I added -Dsamjdk.reference_fasta=/path/to/fasta as an option. This seems to work since htsjdk is able to find the reference file. After this, I get another error that I don't really know what to do about:

INFO    2022-05-25 09:16:58 Defaults    Found file for property samjdk.reference_fasta: /kyukon/scratch/gent/vo/000/gvo00082/vsc44804/nxf.bCirqKd7mW/hg38.fa 
INFO    2022-05-25 09:16:58 ReferenceSource Default reference file /kyukon/scratch/gent/vo/000/gvo00082/vsc44804/nxf.bCirqKd7mW/hg38.fa exists, so going to use that.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/bzip2/BZip2CompressorOutputStream
    at htsjdk.samtools.cram.structure.block.Block.getUncompressedContent(Block.java:203)
    at htsjdk.samtools.cram.build.CramIO.readSAMFileHeader(CramIO.java:316)
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:225)
    at htsjdk.samtools.cram.build.CramContainerIterator.<init>(CramContainerIterator.java:22)
    at htsjdk.samtools.CRAMIterator.<init>(CRAMIterator.java:86)
    at htsjdk.samtools.CRAMFileReader.initWithStreams(CRAMFileReader.java:226)
    at htsjdk.samtools.CRAMFileReader.<init>(CRAMFileReader.java:219)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:422)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208)
    at com.astrazeneca.vardict.VarDictLauncher.readChr(VarDictLauncher.java:187)
    at com.astrazeneca.vardict.VarDictLauncher.initResources(VarDictLauncher.java:80)
    at com.astrazeneca.vardict.VarDictLauncher.start(VarDictLauncher.java:49)
    at com.astrazeneca.vardict.Main.main(Main.java:15)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
    ... 13 more

The used command is:

export JAVA_OPTS='"-Xms9216m" "-Xmx36g" "-Dsamjdk.reference_fasta=hg38.fa"'
vardict-java \
     \
    -c 1 -S 2 -E 3 \
    -b NA24385D2_NVQ_034-ready.cram \
    -th 6 \
    -N NA24385D2_NVQ_034.00002 \
    -G hg38.fa \
    NA24385D2_NVQ_034.00002.bed \
    | teststrandbias.R \
    | var2vcf_valid.pl \
         \
        -N NA24385D2_NVQ_034.00002 \
    | gzip -c > NA24385D2_NVQ_034.00002.vcf.gz

Does anyone, with more knowledge about this, know what I can do about it?

PolinaBevad commented 2 years ago

Hi @nvnieuwk, sorry, we still do not officially support CRAM files and we tested it on very limited number of test sets. I do not recommend to use CRAM with VarDict for now because htsjdk library changed a lot since 2019, sorry! Convert to BAM will be the easiest way here.

nvnieuwk commented 2 years ago

Hi, thanks for the response! I'll convert to BAM :)