TheJacksonLaboratory / diachromatic

Diachromatic is a Java application for preprocessing and quality control of Hi-C and CHi-C data.
https://diachromatic.readthedocs.io/en/latest/
GNU General Public License v3.0
3 stars 1 forks source link

Crash in DigestMap #121

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

I am running Diachromatic on the HiCUP test data (the truncated command went without errors). This is the command

align -q HinD3/foo.truncated_R1.fastq.gz -r HinD3/foo.truncated_R2.fastq.gz -b /usr/bin/bowtie2 -i /home/robinp/bin/bowtie2/hg19 -d /home/robinp/data/diachromatic/hg19test_hg19_DigestedGenome.txt -j -o HinD3 -x foo-align -l 150 -u 800 -s 3000

The initial mapping step worked fine but then there is an error with DigestMap

[main] TRACE (AlignCommand.java:78) - About to read digests from /home/robinp/data/diachromatic/hg19test_hg19_DigestedGenome.txt.
[main] TRACE (DigestMap.java:70) - Found digest file at /home/robinp/data/diachromatic/hg19test_hg19_DigestedGenome.txt.
[main] TRACE (Bowtie2Runner.java:91) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/robinp/bin/bowtie2/hg19 -U HinD3/foo.truncated_R1.fastq.gz -S HinD3/foo-align_6mLY6ur_1.sam
STDOUT=
STDERR=94035 reads; of these:
  94035 (100.00%) were unpaired; of these:
    5861 (6.23%) aligned 0 times
    64727 (68.83%) aligned exactly 1 time
    23447 (24.93%) aligned >1 times
93.77% overall alignment rate

[main] TRACE (Bowtie2Runner.java:91) - Running: /usr/bin/bowtie2 --very-sensitive -p 1 --reorder -x /home/robinp/bin/bowtie2/hg19 -U HinD3/foo.truncated_R2.fastq.gz -S HinD3/foo-align_F72SS1A_2.sam
STDOUT=
STDERR=94035 reads; of these:
  94035 (100.00%) were unpaired; of these:
    6597 (7.02%) aligned 0 times
    63867 (67.92%) aligned exactly 1 time
    23571 (25.07%) aligned >1 times
92.98% overall alignment rate

Exception in thread "main" java.lang.NullPointerException
    at org.jax.diachromatic.align.DigestMap$Chromosome2DigestArray.access$000(DigestMap.java:126)
    at org.jax.diachromatic.align.DigestMap.getDigestPair(DigestMap.java:111)
    at org.jax.diachromatic.align.ReadPair.<init>(ReadPair.java:274)
    at org.jax.diachromatic.align.Aligner.getNextPair(Aligner.java:246)
    at org.jax.diachromatic.align.Aligner.inputSAMfiles(Aligner.java:283)
    at org.jax.diachromatic.command.AlignCommand.execute(AlignCommand.java:87)
    at org.jax.diachromatic.Diachromatic.main(Diachromatic.java:122)

Process finished with exit code 1
hansenp commented 5 years ago

It seem that this error is due to the file: /home/robinp/data/diachromatic/hg19test_hg19_DigestedGenome.txt

How did you generate this file? For which species?

pnrobinson commented 5 years ago

I used GOPHER with the human "test" genes and genome hg19. What would be the best way of making a digest file for the HiCUP test dataset? Also, we should output a better error message?

hansenp commented 5 years ago

Using your digest file I was able to reproduce your error. Then I created a new digest file from scratch using the latest release of GOPHER. I noticed that also the digest on random chromosome are exported to the digest file. Using this file I cannot reproduce the error.

Your digest file does not contain digest on random chromosomes. This causes the error message. Maybe it was created using an older version of GOPHER?

hansenp commented 5 years ago

But we are catching pairs with reads on random chromosomes before calling the function digestMap.getDigestPair.

        // check if both reads are not on random chromosomes
        if (R1.getReferenceName().contains("_") || R2.getReferenceName().contains("_")) {
            this.isPaired = false;
        }

        if (this.isPaired) {

            // pair reads, if both reads could be mapped uniquely
            this.pairReads();
            this.setRelativeOrientationTag();

            // try to find restriction digests that match the read pair
            this.digestPair = digestMap.getDigestPair(this.R1.getReferenceName(), getFivePrimeEndPosOfRead(this.R1), this.R2.getReferenceName(), getFivePrimeEndPosOfRead(this.R2));

Therefore, the random chromosomes cannot be the reason for the error.

But chrM is also missing in your digest file. I think this is the problem.

hansenp commented 5 years ago

Maybe we should add a ChromosomeNotInDigestMap exception. But this would require to check for each read whether the corresponding digest is contained in the digest map, which might reduce the performance. Another solution could be to filter out also read pairs on chrM in addition to read pairs on random chromosomes.

@pnrobinson Suggestions?

pnrobinson commented 5 years ago

Thanks, everything worked with the new Digest file ... I would suggest let's not add a ChromosomeNotInDigestMapException at this time -- new users presumably will create new files and this exception should not occur. If we get bug reports we can always add some new exceptions -- I think if we put them relatively far up we should not have much of a performance penalty.