TheJacksonLaboratory / diachromatic

Diachromatic is a Java application for preprocessing and quality control of Hi-C and CHi-C data.
https://diachromatic.readthedocs.io/en/latest/
GNU General Public License v3.0
3 stars 1 forks source link

bowtie2 error #152

Closed pnrobinson closed 1 year ago

pnrobinson commented 4 years ago

I am doing the diachromatic tutorial to check everything, but I am getting this error are the align step.

Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/peter/data/bowtie/GRCh37/GRCh37.1.bt2 -U /home/peter/data/diachromatic/test/TEST.truncated_R1.fastq.gz -S /home/peter/data/diachromatic/test/TEST_Q5gOuwB_1.sam
STDOUT=
STDERR=(ERR): "/home/peter/data/bowtie/GRCh37/GRCh37.1.bt2" does not exist or is not a Bowtie 2 index

This was the command (I changed my local path). I did this after successfully running the truncate command.


align
-b
/usr/bin/bowtie2
-i
/path/GRCh37/GRCh37.1.bt2
-q
/path/diachromatic/test/TEST.truncated_R1.fastq.gz
-r
/path/diachromatic/test/TEST.truncated_R2.fastq.gz
-d
/path/diachromatic/test/hg19_HinDIII_DigestedGenome.txt
-x
TEST
-o
/path//diachromatic/test
hansenp commented 3 years ago

The index always consists of several files with a common prefix and a common suffix (.bt2). You probably need to leave out .1.bt2.

pnrobinson commented 3 years ago

That was it! But now I get this

[main] TRACE (AlignCommand.java:85) - About to read digests from /home/peter/data/diachromatic/test/hg19_HinDIII_DigestedGenome.txt.
[main] TRACE (DigestMap.java:71) - Found digest file at /home/peter/data/diachromatic/test/hg19_HinDIII_DigestedGenome.txt.
[main] TRACE (Bowtie2Runner.java:89) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/peter/data/bowtie/GRCh37/GRCh37 -U /home/peter/data/diachromatic/test/TEST.truncated_R1.fastq.gz -S /home/peter/data/diachromatic/test/TEST_V9Ga08o_1.sam
STDOUT=
STDERR=92724 reads; of these:
  92724 (100.00%) were unpaired; of these:
    4618 (4.98%) aligned 0 times
    68392 (73.76%) aligned exactly 1 time
    19714 (21.26%) aligned >1 times
95.02% overall alignment rate

[main] TRACE (Bowtie2Runner.java:89) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/peter/data/bowtie/GRCh37/GRCh37 -U /home/peter/data/diachromatic/test/TEST.truncated_R2.fastq.gz -S /home/peter/data/diachromatic/test/TEST_YzqkxGO_2.sam
STDOUT=
STDERR=92724 reads; of these:
  92724 (100.00%) were unpaired; of these:
    5637 (6.08%) aligned 0 times
    67119 (72.39%) aligned exactly 1 time
    19968 (21.53%) aligned >1 times
93.92% overall alignment rate

java.lang.NullPointerException
    at org.jax.diachromatic.align.DigestMap$Chromosome2DigestArray.access$000(DigestMap.java:127)
    at org.jax.diachromatic.align.DigestMap.getDigestPair(DigestMap.java:112)
    at org.jax.diachromatic.align.ReadPair.<init>(ReadPair.java:274)
    at org.jax.diachromatic.align.Aligner.getNextPair(Aligner.java:255)
    at org.jax.diachromatic.align.Aligner.inputSAMfiles(Aligner.java:290)
    at org.jax.diachromatic.command.AlignCommand.call(AlignCommand.java:94)
    at org.jax.diachromatic.command.AlignCommand.call(AlignCommand.java:26)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
    at picocli.CommandLine.access$1100(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
    at picocli.CommandLine.execute(CommandLine.java:2058)
    at org.jax.diachromatic.Diachromatic.main(Diachromatic.java:49)
[main] INFO  (Diachromatic.java:63) - Elapsed time 44.109 seconds.

Process finished with exit code 1
hansenp commented 3 years ago

I suspect an inconsistency between the Bowtie index and the digest map. It is important that both were generated from the same reference. I assume that random chromosomes such as chrUn_KI270589v1 do not exist in your digest map. When Diachromatic iterates over the mapping, it tries to access digests which, according to your digest map, do not exist.

pnrobinson commented 3 years ago

Should we try to emit an error warning instead of allowing diachromatic to crash like this?

pnrobinson commented 3 years ago

Actually, what is happening is that our test data has "5" but the bowtie index has "chr5". This leads to a null pointer error here

 int index = Collections.binarySearch(this.digestMap.get(chrom1).coordArray, coord1);

We should probably make this more robust by having the digestMap contain both "5" and "chr5" for all chromosomes.

hansenp commented 3 years ago

I would suggest not trying to cover all possible inputs. Instead, we should abort with an error, whereby the coordinates of the read pair are output. In addition, we could write something about this in the documentation.

The expected users a probably used to errors such as '5' instead of 'chr5'. At second glance, it should also be clear that the bowtie index and the digests map must come from the same reference. An error message with read pair coordinates would cover both cases so that the user knows what is not working.

pnrobinson commented 3 years ago

I added some code to put more items into the map -- this should increase memory only slightly. Now diachromatic runs without error

 // In some cases, our data uses "chr5" and in others we see just "5".
        // The following adds some additional references to mitigate this issue
        for (Map.Entry<String, Chromosome2DigestArray> e : prelimMap.entrySet()) {
            if (e.getKey().startsWith("chr")) {
                String newKey = e.getKey().substring(3);
                builder.put(newKey, e.getValue());
            } else {
                String newKey = "chr" + e.getKey();
                builder.put(newKey, e.getValue());
            }
        }

and something similar for mitochondrial (M vs MT). @hansenp -- OK so?

hansenp commented 3 years ago

I'm not sure I fully understand the code above. But adding additional references sounds unsafe. In my opinion, the safest would be if we could make sure that the bowtie index and the digest map are from the same reference.

pnrobinson commented 3 years ago

This basically just means that you retrieve the same Chromosome2DigestArray if you search with "5" or if you search with "chr5".

pnrobinson commented 3 years ago

As is, users will never figure this out from the stack trace -- we would need to figure out how to tell Diachromatic what to do with different indices, which might also be pretty difficult.

pnrobinson commented 3 years ago

I am not sure it is good to make the user do this -- the positions are the same for both indices and if we have to check everytime, it will also slow down the code quite a bit.

pnrobinson commented 1 year ago

working if following tutorial