bystrogenomics / bystro

Bystro genetic analysis (annotation, filtering, statistics)
Apache License 2.0
43 stars 15 forks source link

Sprint 12 Task List #501

Closed cristinaetrv closed 2 months ago

cristinaetrv commented 3 months ago

Due date: June 6 2024

Documentation

Proteomics

PRS

Covariance Matrix Estimation/ML library Goal: Make more accurate predictions, more tailored test, better control false positive rate

akotlar commented 3 months ago

2024-05-21

Stats methods topic meeting

Computing p values accurately tailored to distribution that you'd expect if there is no spike - 6/6/24

We have merged the random matrix theory PR, which will be used and Ilha will be evaluating RMT works by setting the ratio of the number of covariates and sample size to a fixed value (p/n = c; c > 0) and letting both p and n go to infinity. This contrasts with classical statistical tests that fix p and let n go to infinity. We're going to evaluating whether RMT works better.

Seeing how well when we assume there is signal in the data, hypothesis test to detect spike in data - 6/6/24

Ilha is making progress on this. He is running first simulation now, running different combinations of singular value shrinkage estimators, and the different types of covariance matrix estimators, and he will be evaluating this via mean squared error on the poe effect estimators, that are simulated.

akotlar commented 3 months ago

Covariance Matrix / POE

Conservativeness:

Publication Plan for Q3

  1. SPPCA - once Dave Carlson is back from vacation he'll finish giving feedback, this we're targeting to be out by end of summer.
  2. PoE draft, modulo Mike Epstein's students actually completing their UKBB analysis.
  3. Platform paper - Bystro platform updates / generative AI discussion.

Sprint 13 plan update

Austin will implement some spherical p-value tests that are a direct POIROT competitor

Domain Adaptation

Test looks really successful at removing batch effects; TAMPOR does not appear to remove them, at least entirely.

DomainAdaptationTest.ipynb.zip

akotlar commented 2 months ago

2024-05-31

PRS

Automatically launch PRS after ancestry from API server

Pushed back to last week of sprint

Display basic PRS results in webapp (table with individuals and their score) - @akotlar - 6/6/24 - https://github.com/bystrogenomics/bystro/pull/509

Same

Add batch processing for PRS C+T workflow with dosage matrix for memory issues

Under review

Need annotated AD stat summary to include ancestry

Done

Take in top hit from ancestry, convert to superpop, connect to LD map for corresponding pop for LD clump

Should be in testing by 6/6

Weigh PRS scores by gnomad allele frequencies for specific ancestries and the corresponding ancestry probability

Should be done by 6/6

Finish PRS-CS standard way without Langevin Dynamics

PR'd, there's a test to fix

Take in ancestry PCs as PRS-CS covariates

@akotlar and @austinTalbot7241993 will talk about this on Monday

Genotype imputation

Hypothesis testing

Guy at UC Davis has a good implementation, so we're relying on that

Covariance matrix estimation

We have geodisics in (may use in domain adaptation), we have pyreamann PR'd.

We have been working this week on how well the covariance matrix estimation, and conservativeness. Operator norm has the best MSE and worst conservativeness, and nuclear norm had the best conservativeness and ok MSE.

The difficulty is estimating the largest singular values, which needs to be done by looking at heterozygotes; we were shafting ourselves by looking at low frequency hets and with small effect sizes.

Summary

PRS + Proteomics

Will be done by 6/6 for prototype

ML

On track, good progress

akotlar commented 2 months ago

2024-06-07

Bystro Sync

@akotlar

@cristinaetrv

@IlhaH

@austinTalbot7241993