DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
127 stars 41 forks source link

Does HLA-LA remove duplicate reads first? #92

Open yingchen69 opened 1 year ago

yingchen69 commented 1 year ago

Hi,

I run HLA-LA on WES data. The data was bam files the pre-processed following GATK pre-process pipeline, where reads were mapped, duplicate marked, base quality score re-calibrated. samtools flagstat shows that there are ~20% duplicate reads. I checked the documents and posts here, and cannot find any info regarding the treatment of duplicate reads in the pipeline.

I ran HLA-LA on my samples and the pipeline finished with no major issue if I do not do anything duplicate reads. The pipeline failed if I removed the duplicate before running HLA-LA:

HLA-LA: hla/HLATyper.cpp:974: void hla::HLATyper::HLATypeInference(const std::vector<mapper::reads::oneReadPair>&, const std::vector<mapper::reads::verboseSeedChainPair>&, c
onst std::vector<mapper::reads::oneRead>&, const std::vector<mapper::reads::verboseSeedChain>&, double, double, std::string, std::string): Assertion `(rawPairedReads.size() 
> 0) || (rawUnpairedReads.size() > 0)' failed.

In the out file, the last two sentence are:

[ Tue Mar 28 01:20:13 2023 ] Initiate HLA typing!
Call HLA typing with 0 alignments.

Any suggestion?

Thanks a lot!

Ying

AlexanderDilthey commented 1 year ago

Hi @yingchen69,

HLALA does not remove PCR duplicates and it is probably a good idea to remove them manually before applying HLALA (I have not tested this, however).

Based on the error message my assumption would be that no HLA reads remain after you have carried out the filtering step - could you check in the input BAM you provide whether there are still reads in the MHC?

Best,

Alex