AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
127 stars 55 forks source link

cram input #380

Open GSBaohuaGu opened 1 year ago

GSBaohuaGu commented 1 year ago

I tried the method you mentioned solve the problem if the input is cram format, but still reported an error. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/bzip2/BZip2CompressorOutputStream at htsjdk.samtools.cram.structure.block.Block.getUncompressedContent(Block.java:203) at htsjdk.samtools.cram.build.CramIO.readSAMFileHeader(CramIO.java:316) at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:225) at htsjdk.samtools.cram.build.CramContainerIterator.(CramContainerIterator.java:22) at htsjdk.samtools.CRAMIterator.(CRAMIterator.java:86) at htsjdk.samtools.CRAMFileReader.initWithStreams(CRAMFileReader.java:226) at htsjdk.samtools.CRAMFileReader.(CRAMFileReader.java:219) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:422) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208) at com.astrazeneca.vardict.VarDictLauncher.readChr(VarDictLauncher.java:187) at com.astrazeneca.vardict.VarDictLauncher.initResources(VarDictLauncher.java:80) at com.astrazeneca.vardict.VarDictLauncher.start(VarDictLauncher.java:49) at com.astrazeneca.vardict.Main.main(Main.java:15) Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 13 more

karlestira commented 1 year ago

you can download a new htsjdk and build it with full dependency addon (using IntelliJ IDEA to build with command ./gradlew shadowJar, the zip file is htsjdk 3.0.5), and use CLASSPATH=[htsjdk.jar]:[vardict.jar] (notice the order is important) when launch JVM. This will replace htsjdk in vardict with your new htsjdk and fix dependency problem.

java -classpath "[path of htsjdk.jar]:[path of vardict.jar]" com.astrazeneca.vardict.Main [other vardict params]

htsjdk-3.0.5.jar.zip

however, Vardict will be greatly slower(2 times or more depend on the seqs_per_slice set when generate CRAM) when runing on CRAM. Your CRAM is compressed in bzip2 format so the seqs_per_slice is very high by default in samtools(which means Vardict will be very very slow). If possible, I suggest converting your CRAM to BAM.