Closed droeatumn closed 1 year ago
Hi @droeatumn,
I do not think that single-ended short reads are really suitable for HLA type inference - without the paired-end information, you lose a lot of mapping information in repetitive regions, and correctly determining the locus of origin for unpaired reads of the size you describe would be very difficult. In the HLA*PRG paper (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005151#pcbi.1005151.s005), we analyzed the impact of effective fragment length on accuracy, and found elevated error rates in cohorts with short effective fragment lengths (S5 Fig).
So, while you can make the algorithm run on unpaired short reads, the results are not really going to be reliable (consistent with your observations). In order to support short reads, one would probably have to implement a joint genotyping approach (i.e. not genotyping loci independently), and even though this would be a cool project, we currently have no plans in this direction...
Alex
When I run with unpaired reads, I get "You didn't activate --longReads, but the two files ... (which store paired-end reads) are empty - this is weird, and I will abort".
Is there support for non-paired short reads? If not, is this a fundamental issue, or could it be added as an enhancement? I assume just adding the --longReads option isn't a good idea.