DKMS / Hapl-o-Mat

A software for haplotype inference
Other
11 stars 7 forks source link

Confusion about the value of the calculated results #14

Open masen1991 opened 2 years ago

masen1991 commented 2 years ago

@usolloch @sauter Thank you very much for developing this software, which is very helpful for HLA calculation. I have some questions about the calculation results, and I hope you can give me some detailed and accurate explanations. When I used Hapl-o-Mat to estimate the haplotype frequency, I used the data of 20 individuals, each of whom had the 2-field analysis results of 8 HLA types. Based on this result I estimated the haplotype using the following parameters.

file name

FILENAME_INPUT=some.txt FILENAME_HAPLOTYPES=haplotypes.dat FILENAME_GENOTYPES=genotypes.dat FILENAME_HAPLOTYPEFREQUENCIES=some_hfs.dat FILENAME_EPSILON_LOGL=epsilon.dat

reports

LOCI_AND_RESOLUTIONS=A:4d,B:4d,C:4d,DQA1:4d,DQB1:4d,DRB1:4d,DPA1:4d,DPB1:4d MINIMAL_FREQUENCY_GENOTYPES=1e-5 DO_AMBIGUITYFILTER=false EXPAND_LINES_AMBIGUITYFILTER=false WRITE_GENOTYPES=true

EM-algorithm

INITIALIZATION_HAPLOTYPEFREQUENCIES=perturbation EPSILON=1e-6 CUT_HAPLOTYPEFREQUENCIES=1e-6 RENORMALIZE_HAPLOTYPEFREQUENCIES=true SEED=0

Through the calculation of EM algorithm, I get the following results

Screen Shot 2022-09-06 at 4 58 38 PM

I have two questions about this result. 1.As seen in the figure, the haplotype frequency is calculated scientifically at 10 decimal places or more. Is the value of 0.07500005969918 equivalent to 0.075? How to understand the result of 0.04999924381036? Why do you get these numbers? Whether it is related to the number of iterations of EM algorithm. In addition, is the intuitive understanding of the population of 20 individuals, such as a number to convert to another format, such as 0.05, appropriate? 2.I learned that the seed setup can specify pseudorandom numbers to make the results repeatable. However, the default value of 0 is the DateTime value, so the result will be different each time. But we thought if you get a different haplotype every time you get seed=0, how do you choose which calculation will give you the most accurate and reliable haplotype frequency?