[x] Take data cleaning analysis w/ preprocess step that separates out somascan and TMT, see effect of domain adaptation before combining vs combined @akotlar 6/11/2024
[x] Filtering needs to be generalized to SomaScan @akotlar 6/14/2024
[ ] Harmonize SomaScan/TMT datasets - latent variable model with two sets of covariates, do imputation on each, harmonization minimizing the discrepancy @austinTalbot7241993 6/14/2024
[x] Display basic PRS results in webapp (table with individuals and their score) - @akotlar 6/17/2024
[x] Document design choices for PRS allele frequency weighting - @cristinaetrv - 6/14/2024
[x] Weigh PRS scores by gnomad allele frequencies for specific ancestries and the corresponding ancestry probability - @cristinaetrv 6/12/2024
[x] Take in top hit from ancestry, convert to superpop (for allele freq only), connect to LD map for corresponding pop for LD clump - @cristinaetrv 6/14/2024
[x] Take in 5 gnomad superpop AFs in chunks (100k or less) of thresholded score loci converted to query format using query library on annotation for target dataset - @cristinaetrv 6/21/2024
[ ] Research remaining LD maps - @cristinaetrv 6/26/2024
[ ] Add remaining LD maps if they're easy to find - @cristinaetrv 6/26/2024
[ ] Liftover LD maps if they're easy to find - @cristinaetrv 6/28/2024
[ ] Get harmonized AD summary stats sanitized - @cristinaetrv 6/28/2024
[ ] (stretch) Liftover harmonized AD summary stats @cristinaetrv
[ ] Fix clump by pval - @cristinaetrv 6/28/2024
[ ] v2 PRS integration - 2024-06-28 - @akotlar - scope needs to be defined, but minimally need to allow uploading covariates, and similar perf to Dave's work at least
[ ] (sprint 14) Follow up with gates about GWAS summary statistics and what we can include with our platform
[ ] (sprint 14) Experiment management is back in and integrated with search / APIs so that we can pull covariates/traits
Covariance Matrix Estimation/ML library
Goal: Hand off POE method to Mike by end of sprint
[x] Make more computational and alternative hypothesis tests for Ilha to benchmark @austinTalbot7241993 6/27/2024
[x] Updates to loss functions - @IlhaH 6/27/2024
[ ] Computational benchmarking (compared to POIROT) - @IlhaH 6/27/2024
Platform
[ ] Per-sample data management v1 - 2024-06-28 - @akotlar
[ ] Basic LLM demo - 2024-06-28 - @akotlar
(stretch) [ ] Bystro Annotator AMI is fully restartable
Documentation
[ ] Separate out annotator description/perl side including performance figures, describe every piece that repo has including Machine Learning subsection, Bioinformatics tools subsection (installation first) - 6/27/2024
[ ] GIF of how you would use general purpose ML library - 6/27/2024
Alex met with Erik Dammer, and Erik will send more information about which files are the ones we should be analyzing
Erik hadn't normalized within batch in dataset that Alex had been using because they were comparing tissues types and looking at total abundance numbers, but Erik will provide name of dataset that was used for network analysis. Instead, two types of data (soma and TMT) were considered as 'batches' so they are normalized by platform.
POE:
Test is anti-conservative, but can use a bootstrap approach and see what coefficient estimates are and see which ones have a POE
Proteomics: Goal: Wrap up proteomics methods
GIN 6/17 work:
PRS
Covariance Matrix Estimation/ML library Goal: Hand off POE method to Mike by end of sprint
Platform
Documentation