bcm-uga / Loter

A software package for local ancestry inference and haplotype phasing
Other
38 stars 7 forks source link

Splitting Vcf per chr & Optimal Parameter for Phase Correction #25

Closed casia16 closed 1 year ago

casia16 commented 2 years ago

Hi, Thank you for developing this great method of LAI. So far I can explore a lot with my own cattle data and it was nice to see the results, confirming our hypothesis.

I have two questions, First, should I split the VCF file per chromosome? judging the SNPs are seen as not independent in the algorithm, or could be handled automatically? Second, I found quite some switch errors, I did try the smoothing of phase correction. It works but we still have some unfix switch. Any recommended step to find optimal parameter (for ex, lambda) because when I try to change it to bigger lambda values, it seems improve the results (lesser switch errors) compared to change rate_vote and threshold.

gdurif commented 2 years ago

Hi, Thanks for your interest in Loter. Regarding your questions:

  1. Yes, you should split the VCF file per chromosome. Loter assumes that the input is a set of consecutive SNPs belonging to the same DNA molecule.
  2. Regarding the smoothing for phase correction:
    • the lambda parameter controls the local ancestry switch likelihood between consecutive SNPs (when considering haplotypes independently). Higher lambda means longer ancestry chunks/tracts which could counter-balance phasing switch errors between homologous haplotypes but does not correct it per se. You can increase the range of lambda values or shift it towards larger value to consider longer ancestry chunks in the procedure.
    • the threshold parameter (between 0 and 1) is the one controlling the smoothing of the phase correction, the lower the stronger smoothing (if I remember correctly).

I am surprised by what you observed. Maybe it is a combination of both points 1 and 2, the ancestry tracts are too short and the smoothing is not strong enough, so you could combine both (increase the lambda range and decrease the smoothing threshold).

Best