InstituteforDiseaseModeling / genepi_science

a space for genepi research collaboration
MIT License
1 stars 2 forks source link

Test note taking #5

Open AlLee-IDM opened 3 years ago

AlLee-IDM commented 3 years ago

Is it easier to compile notes as issues, or as files in one of the repo directories?

Below we test how things look if we copy and paste a google doc into an issue comment:

AlLee-IDM commented 3 years ago

[PLOT hist of oocyst density as drawn from GenEpi]

[PLOT serial intervals] JRR

[PLOT generation times] JRR

[PLOT of geometric mean of gam density by infection age?] JRR

[PLOT heterozygosity by site still at 0.1 rather than 0.5 in importation scenario?] JVR

[PLOT of number of nodes (parasite clones) over time] JVR


Discussion 7.1.2021

REAL MCCOIL assumes uniform distribution by site real samples (high transmission scenario?)

MAF counted across all individuals, not aggregated by sample? No highly variable?

Not replicating the right co-transmitted or superinfecting diversity for mixed infections


First validating MAF across population, but then seeing difference by sample would highlight where in the model we might be representing a different process. Running the uniform gametocyte apportionment


If COI and MAF are looking like real data but heterozygosity doesnt that really narrows it down.

Underestimating the polygenomic fraction that they got from barcode

Even with same positions, the fraction is underestimated because of too conservative threshold for het calls? Subject to low read depth?


Scenario with “constant” parasite population time

For eff pop size, you should get fixation rate at any point in time, something something calculus (Albert to chew on) :)

Continue literature search, look at Prin of PopGen (AL)


Metrics:

COI by person by month

MAF by population by year

MAF by site by year

Heterozygosity metrics:

  1. Heterozygosity by infection event (e.g. for infections of COI >= 2, what proportion sites are heterozygous)

  2. Heterozygosity by site/sample* averaged over all samples in each year (0 for all clonal infections, 1 for all het polyclonal infections)

  • site/sample meaning the average frequency of heterozygous allele at a site within a sample


  | Site 1 | Site 2 | Site 3 | Het Metric 1 -- | -- | -- | -- | -- Inf 1 | 1 | 0 | 0 | 0 Inf 2 | .5 | 1 | 0 | 1/3 Inf 3 | 0 | 0 | 0 | 0 Het Metric 2 | 1/2 | 0 | 0 |  

Known issue with sensitivity of genotype callers to threshold: If ⅘ calls are dominant with threshold of .2 you will get a het call vs if you had just one more infection of ⅚ using same threshold you would not pick it up

Potential fix: normalization by multiplication by gam density (taking into account dominant infection from genepi to add to this calculation) this should move these closer to realism


Giving density by genotype in table output from GenEpi for use in Observation model (even just identifier of dominant infection)


Checklist updates:

  • Sjjd

  • Test