Closed pnrobinson closed 5 years ago
It seem that this error is due to the file:
/home/robinp/data/diachromatic/hg19test_hg19_DigestedGenome.txt
How did you generate this file? For which species?
I used GOPHER with the human "test" genes and genome hg19. What would be the best way of making a digest file for the HiCUP test dataset? Also, we should output a better error message?
Using your digest file I was able to reproduce your error. Then I created a new digest file from scratch using the latest release of GOPHER. I noticed that also the digest on random chromosome are exported to the digest file. Using this file I cannot reproduce the error.
Your digest file does not contain digest on random chromosomes. This causes the error message. Maybe it was created using an older version of GOPHER?
But we are catching pairs with reads on random chromosomes before calling the function digestMap.getDigestPair
.
// check if both reads are not on random chromosomes
if (R1.getReferenceName().contains("_") || R2.getReferenceName().contains("_")) {
this.isPaired = false;
}
if (this.isPaired) {
// pair reads, if both reads could be mapped uniquely
this.pairReads();
this.setRelativeOrientationTag();
// try to find restriction digests that match the read pair
this.digestPair = digestMap.getDigestPair(this.R1.getReferenceName(), getFivePrimeEndPosOfRead(this.R1), this.R2.getReferenceName(), getFivePrimeEndPosOfRead(this.R2));
Therefore, the random chromosomes cannot be the reason for the error.
But chrM
is also missing in your digest file. I think this is the problem.
Maybe we should add a ChromosomeNotInDigestMap
exception. But this would require to check for each read whether the corresponding digest is contained in the digest map, which might reduce the performance. Another solution could be to filter out also read pairs on chrM
in addition to read pairs on random chromosomes.
@pnrobinson Suggestions?
Thanks, everything worked with the new Digest file ... I would suggest let's not add a ChromosomeNotInDigestMapException at this time -- new users presumably will create new files and this exception should not occur. If we get bug reports we can always add some new exceptions -- I think if we put them relatively far up we should not have much of a performance penalty.
I am running Diachromatic on the HiCUP test data (the truncated command went without errors). This is the command
The initial mapping step worked fine but then there is an error with DigestMap