Open nvnieuwk opened 1 year ago
@vruano Your thoughts on this one?
@nvnieuwk Can you check the sequence dictionaries for your reference and cram to see what the length of the "HLA-DRB1*04:03:01" contig is reported as? It should be 15246 for the hg38 reference.
You can check the reference sequence dictionary by searching the .dict
file for the contig name, and you can check the cram dictionary by inspecting the entry for that contig in the cram header.
In the .dict
file it shows the correct length:
@SQ SN:HLA-DRB1*04:03:01 LN:15246 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa AS:GRCh38 M5:ce0de8afd561fb1fb0d3acce94386a27 SP:Human
And in the cram header it shows the same length:
@SQ SN:HLA-DRB1*04:03:01 LN:15246 AH:* M5:ce0de8afd561fb1fb0d3acce94386a27 UR:/kyukon/data/gent/shared/000/gvo00082/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
@droazen make sense to check the .fai file and the fasta itself just in case or is .dict guaratee t be the only source for contig lengths at this point.
Looking at the htsjdk code responsible for the original throw (as far as I can see in the stack enclosed in the description) there is a few "smells" in the way synchronized is used or not use ReferenceSource.java. It is likely to be the reason behind the error considering that is failing in multi-thread.
Probably adding synchronized to getReferenceBasesByRegion would fix that. Is a htsjdk issue and not a GATK one. Do you want to add a workaround in GATK or press for a fix and update of the htsjdk dependency. @droazen?
more concretely the private method getReferenceBases(SAMSeqRecord) should be syncronized or avoid it calling directly to the syncronized getReferenceBases(SSR, boolean) and getReferenceBasesByRegions should not update the cache fields.
Hi, any news on this? :)
Instructions
The github issue tracker is for bug reports, feature requests, and API documentation requests. General questions about how to use the GATK, how to interpret the output, etc. should be asked on the official support forum.
_
) as appropriateBug Report
Affected tool(s) or class(es)
GATK CalibrateDragstrModel
Affected version(s)
Description
When running CalibrateDragstrModel in parallel mode, the supplied reference isn't detected correctly causing the following error stack trace:
However it does work when running the tool single threaded with the exact same options.
Steps to reproduce
I've sadly been unable to create a reproducible example. I've only encountered this with non-public data which I can't share here. I'd be happy to run tests for you though.