broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
338 stars 60 forks source link

New htsjdk and options *-list #117

Open SergejN opened 4 years ago

SergejN commented 4 years ago

Dear all,

I updated the htsjdk library because htsjdk-2.14.3 does not support csi files. Additionally, I patched the library in order to deal with the bug report https://github.com/samtools/htsjdk/issues/1380. I also added the options *-list for --bam, --jumps, --frags, and --unpaired, which make it possible to specify files containing lists of filenames rather than specifying the corresponding option multiple times. In my project, I am dealing with ~600 BAM files at a time, therefore, it is convenient for such use cases.

mr-c commented 3 years ago

@SergejN Can you point to a specific patch you made to the htsjdk library?

SergejN commented 3 years ago

@mr-c I never upload it into a repository because it looks like a dirty hack, which nevertheless worked for the axolotl. Here is the patch (proper error handling is needed if the sequences are longer than 2^31, but I didn't do it)

--- htsjdk/src/main/java/htsjdk/samtools/reference/FastaSequenceFile.java.ORIG  2020-10-16 10:31:54.000000000 +0200
+++ htsjdk/src/main/java/htsjdk/samtools/reference/FastaSequenceFile.java   2019-06-06 23:27:43.000000000 +0200
@@ -174,9 +174,12 @@
                 break;
             }
             if (sequenceLength == bases.length) {
-                    final byte[] tmp = new byte[bases.length * 2];
-                    System.arraycopy(bases, 0, tmp, 0, sequenceLength);
-                    bases = tmp;
+                int size = bases.length * 2;
+                if(size < 0)
+                    size = Integer.MAX_VALUE - 100;
+                final byte[] tmp = new byte[size];
+                System.arraycopy(bases, 0, tmp, 0, sequenceLength);
+                bases = tmp;
             }
         }

Now, the reason for Integer.MAX_VALUE - 100 is that Java apparently needs a bit of additional space when allocating an array, but the amount of the additional space varies between different Java versions.

Best, Sergej