DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
124 stars 42 forks source link

support for non-paired short reads? #54

Closed droeatumn closed 1 year ago

droeatumn commented 3 years ago

When I run with unpaired reads, I get "You didn't activate --longReads, but the two files ... (which store paired-end reads) are empty - this is weird, and I will abort".

Is there support for non-paired short reads? If not, is this a fundamental issue, or could it be added as an enhancement? I assume just adding the --longReads option isn't a good idea.

AlexanderDilthey commented 1 year ago

Hi @droeatumn,

I do not think that single-ended short reads are really suitable for HLA type inference - without the paired-end information, you lose a lot of mapping information in repetitive regions, and correctly determining the locus of origin for unpaired reads of the size you describe would be very difficult. In the HLA*PRG paper (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005151#pcbi.1005151.s005), we analyzed the impact of effective fragment length on accuracy, and found elevated error rates in cohorts with short effective fragment lengths (S5 Fig).

So, while you can make the algorithm run on unpaired short reads, the results are not really going to be reliable (consistent with your observations). In order to support short reads, one would probably have to implement a joint genotyping approach (i.e. not genotyping loci independently), and even though this would be a cool project, we currently have no plans in this direction...

Alex