EPPIcenter / mad4hatter

https://eppicenter.github.io/mad4hatter/
5 stars 13 forks source link

Varying order of positions and variants for multiallelic loci in resmarkers_microhap_table.txt #105

Closed manuelgug closed 1 year ago

manuelgug commented 1 year ago

In some cases, when there are multiple alleles, positions and variants are ordered diferently for each allele in the resmarkers_microhap_table.txt output. This may become confusing when performing downstream analyses.

Example

sampleID Gene_ID Gene Microhaplotype_Index Reference_Microhaplotype Microhaplotype Microhaplotype_Ref/Alt Reads
1904701_S18_L001 PF3D7_0810800 dhps 431/436/437 I/S/G I/F/A ALT 220
1904701_S18_L001 PF3D7_0810800 dhps 436/431/437 S/I/G S/I/A ALT 165
1904701_S18_L001 PF3D7_0810800 dhps 436/437/431 S/G/I S/G/I REF 2131
manuelgug commented 1 year ago

There are also several monoallelic loci with positions/variants sorted descendingly and others with no discernable pattern.

bgpalmer commented 1 year ago

This may be fixed in my latest PR - testing on some datasets now to confirm