dcgerard / updog

Flexible Genotyping of Polyploids using Next Generation Sequencing Data
https://dcgerard.github.io/updog/
24 stars 8 forks source link

Sequencing Error Rate Estimation in F1 Populations #19

Closed dcgerard closed 10 months ago

dcgerard commented 3 years ago

{updog} can sometimes have poor performance when all of the following occur:

  1. model = "f1" or model = "s1",
  2. Offspring sequenced at low depth, but
  3. Parents are sequenced to large depth, and
  4. All offspring have the same genotype. So the parental genotypes could be AA and aa (for diploids).

The issue is that {updog} can prefer to estimate a large sequencing error rate and place the parents to both have genotypes of 1.

This should only affect the model = "f1" and model = "s1" options.

eaa231c869e8d2e4c1fd11dc28e53e6270c9d5c0 patches things a little bit, by placing an upper bound on the sequencing error rate.

However, full correction will need me to dive back into the code. Right now, I don't use the parental sequences when optimizing the sequencing error rate (just the offspring sequences). Adjusting the objective to include parental sequences should fix the problem.

dcgerard commented 1 year ago

@Cristianetaniguti, could you take a look at the f1 branch of {updog} and see if this mostly resolves the issues that you discovered?

You can install that repo via:

remotes::install_github(repo = "dcgerard/updog", ref = "f1")
Cristianetaniguti commented 1 year ago

Hey @dcgerard,

Sorry for taking a long time to test it. But I finally did!

I also tested another idea that you gave me when we met in San Diego, about using a combination of the updog genotype probabilities with a global error of 5%. I included as an error rate in OneMap HMM: 1 - (1 - estimated genotype probability by updog)*(1-global error)

I built linkage maps for f1 populations using a subset of the data and the combinations:

My conclusions are:

I hope this answers your question. I think it is still worth using the main branch and filtering the non-informative before using updog.

dcgerard commented 1 year ago

Thanks for the work @Cristianetaniguti! This is very interesting (and counterintuitive, since the f1 branch should give better results when using model = "f1"). Some questions:

  1. You used model = "f1" in both cases, right?
  2. These results are only for diploids, right?
Cristianetaniguti commented 1 year ago

Hey @dcgerard,

  1. Yes, I used the exact same code. I only changed the updog version.
  2. Yes, I am still adapting Reads2Map to polyploids.

These issues are only details, overall updog performs pretty well and it usually improves low-depth data genotypes. Let me know if you have other ideas to test. I can do it easily now.

dcgerard commented 1 year ago

Thank you so much! I really appreciate you working on this!

dcgerard commented 10 months ago

Closing for now. Don't think there is anything more to do here at the moment.