Closed pnrobinson closed 1 year ago
The index always consists of several files with a common prefix and a common suffix (.bt2
). You probably need to leave out .1.bt2
.
That was it! But now I get this
[main] TRACE (AlignCommand.java:85) - About to read digests from /home/peter/data/diachromatic/test/hg19_HinDIII_DigestedGenome.txt.
[main] TRACE (DigestMap.java:71) - Found digest file at /home/peter/data/diachromatic/test/hg19_HinDIII_DigestedGenome.txt.
[main] TRACE (Bowtie2Runner.java:89) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/peter/data/bowtie/GRCh37/GRCh37 -U /home/peter/data/diachromatic/test/TEST.truncated_R1.fastq.gz -S /home/peter/data/diachromatic/test/TEST_V9Ga08o_1.sam
STDOUT=
STDERR=92724 reads; of these:
92724 (100.00%) were unpaired; of these:
4618 (4.98%) aligned 0 times
68392 (73.76%) aligned exactly 1 time
19714 (21.26%) aligned >1 times
95.02% overall alignment rate
[main] TRACE (Bowtie2Runner.java:89) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/peter/data/bowtie/GRCh37/GRCh37 -U /home/peter/data/diachromatic/test/TEST.truncated_R2.fastq.gz -S /home/peter/data/diachromatic/test/TEST_YzqkxGO_2.sam
STDOUT=
STDERR=92724 reads; of these:
92724 (100.00%) were unpaired; of these:
5637 (6.08%) aligned 0 times
67119 (72.39%) aligned exactly 1 time
19968 (21.53%) aligned >1 times
93.92% overall alignment rate
java.lang.NullPointerException
at org.jax.diachromatic.align.DigestMap$Chromosome2DigestArray.access$000(DigestMap.java:127)
at org.jax.diachromatic.align.DigestMap.getDigestPair(DigestMap.java:112)
at org.jax.diachromatic.align.ReadPair.<init>(ReadPair.java:274)
at org.jax.diachromatic.align.Aligner.getNextPair(Aligner.java:255)
at org.jax.diachromatic.align.Aligner.inputSAMfiles(Aligner.java:290)
at org.jax.diachromatic.command.AlignCommand.call(AlignCommand.java:94)
at org.jax.diachromatic.command.AlignCommand.call(AlignCommand.java:26)
at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
at picocli.CommandLine.access$1100(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
at picocli.CommandLine.execute(CommandLine.java:2058)
at org.jax.diachromatic.Diachromatic.main(Diachromatic.java:49)
[main] INFO (Diachromatic.java:63) - Elapsed time 44.109 seconds.
Process finished with exit code 1
I suspect an inconsistency between the Bowtie index and the digest map. It is important that both were generated from the same reference. I assume that random chromosomes such as chrUn_KI270589v1
do not exist in your digest map. When Diachromatic iterates over the mapping, it tries to access digests which, according to your digest map, do not exist.
Should we try to emit an error warning instead of allowing diachromatic to crash like this?
Actually, what is happening is that our test data has "5" but the bowtie index has "chr5". This leads to a null pointer error here
int index = Collections.binarySearch(this.digestMap.get(chrom1).coordArray, coord1);
We should probably make this more robust by having the digestMap contain both "5" and "chr5" for all chromosomes.
I would suggest not trying to cover all possible inputs. Instead, we should abort with an error, whereby the coordinates of the read pair are output. In addition, we could write something about this in the documentation.
The expected users a probably used to errors such as '5' instead of 'chr5'. At second glance, it should also be clear that the bowtie index and the digests map must come from the same reference. An error message with read pair coordinates would cover both cases so that the user knows what is not working.
I added some code to put more items into the map -- this should increase memory only slightly. Now diachromatic runs without error
// In some cases, our data uses "chr5" and in others we see just "5".
// The following adds some additional references to mitigate this issue
for (Map.Entry<String, Chromosome2DigestArray> e : prelimMap.entrySet()) {
if (e.getKey().startsWith("chr")) {
String newKey = e.getKey().substring(3);
builder.put(newKey, e.getValue());
} else {
String newKey = "chr" + e.getKey();
builder.put(newKey, e.getValue());
}
}
and something similar for mitochondrial (M vs MT). @hansenp -- OK so?
I'm not sure I fully understand the code above. But adding additional references sounds unsafe. In my opinion, the safest would be if we could make sure that the bowtie index and the digest map are from the same reference.
This basically just means that you retrieve the same Chromosome2DigestArray if you search with "5" or if you search with "chr5".
As is, users will never figure this out from the stack trace -- we would need to figure out how to tell Diachromatic what to do with different indices, which might also be pretty difficult.
I am not sure it is good to make the user do this -- the positions are the same for both indices and if we have to check everytime, it will also slow down the code quite a bit.
working if following tutorial
I am doing the diachromatic tutorial to check everything, but I am getting this error are the align step.
This was the command (I changed my local path). I did this after successfully running the
truncate
command.