Sprint 13 Task List - Githubissues

Proteomics: Goal: Wrap up proteomics methods

[x] Take data cleaning analysis w/ preprocess step that separates out somascan and TMT, see effect of domain adaptation before combining vs combined @akotlar 6/11/2024
[x] Filtering needs to be generalized to SomaScan @akotlar 6/14/2024
[ ] Harmonize SomaScan/TMT datasets - latent variable model with two sets of covariates, do imputation on each, harmonization minimizing the discrepancy @austinTalbot7241993 6/14/2024
[ ] Demonstrate network analysis on ~300 sample dataset @akotlar 6/14/2024

GIN 6/17 work:

[x] Ability to download or stream ancestry json - @akotlar - done: https://github.com/bystrogenomics/bystro-web/pull/453
[x] Ability to convert ancestry json to tsv/csv - @akotlar - done: https://github.com/bystrogenomics/bystro/pull/525
[x] Ability to explode annotation tsv/tsv.gz by gene name - @akotlar - done: https://github.com/bystrogenomics/bystro/pull/523

PRS

[ ] PR Dave's version of PRS @akotlar 6/11/2024
[x] Ask Thomas about: Genotyping that Emory is ingesting on Emory samples, Illumina 650K Array (most recent, cheapest array), will require imputation (topmed, etc) @cristinaetrv 6/12/2024
[ ] (backlog) Complete Citi training (Alex & Cristina) and email Paula/Petek (Cristina) for access to the CHOP data. - @cristinaetrv 6/21/2024
[x] Test imputation method for PRS @austinTalbot7241993 - 6/27/2024
[ ] Compare our imputation to Minimac4 @austinTalbot7241993 - 6/21/2024
[ ] (stretch) Write imputation method in C @austinTalbot7241993 - 6/27/2024
[ ] Take in ancestry PCs as PRS-CS covariates - @akotlar - 6/27/2024
[ ] Take in GWAS summary statistics as PRS-CS covariates - @austinTalbot7241993 - 6/27/2024
[x] Finish v1 PRS integration - 2024-06-13 - @akotlar
[x] Display basic PRS results in webapp (table with individuals and their score) - @akotlar 6/17/2024
[x] Document design choices for PRS allele frequency weighting - @cristinaetrv - 6/14/2024
[x] Weigh PRS scores by gnomad allele frequencies for specific ancestries and the corresponding ancestry probability - @cristinaetrv 6/12/2024
[x] Take in top hit from ancestry, convert to superpop (for allele freq only), connect to LD map for corresponding pop for LD clump - @cristinaetrv 6/14/2024
[x] Take in 5 gnomad superpop AFs in chunks (100k or less) of thresholded score loci converted to query format using query library on annotation for target dataset - @cristinaetrv 6/21/2024
[ ] Research remaining LD maps - @cristinaetrv 6/26/2024
[ ] Add remaining LD maps if they're easy to find - @cristinaetrv 6/26/2024
[ ] Liftover LD maps if they're easy to find - @cristinaetrv 6/28/2024
[ ] Get harmonized AD summary stats sanitized - @cristinaetrv 6/28/2024
[ ] (stretch) Liftover harmonized AD summary stats @cristinaetrv
[ ] Fix clump by pval - @cristinaetrv 6/28/2024
[ ] v2 PRS integration - 2024-06-28 - @akotlar - scope needs to be defined, but minimally need to allow uploading covariates, and similar perf to Dave's work at least
[ ] (sprint 14) Follow up with gates about GWAS summary statistics and what we can include with our platform
[ ] (sprint 14) Experiment management is back in and integrated with search / APIs so that we can pull covariates/traits

Covariance Matrix Estimation/ML library Goal: Hand off POE method to Mike by end of sprint

[x] Make more computational and alternative hypothesis tests for Ilha to benchmark @austinTalbot7241993 6/27/2024
[x] Updates to loss functions - @IlhaH 6/27/2024
[ ] Computational benchmarking (compared to POIROT) - @IlhaH 6/27/2024

Platform

[ ] Per-sample data management v1 - 2024-06-28 - @akotlar
[ ] Basic LLM demo - 2024-06-28 - @akotlar
(stretch) [ ] Bystro Annotator AMI is fully restartable

Documentation

[ ] Separate out annotator description/perl side including performance figures, describe every piece that repo has including Machine Learning subsection, Bioinformatics tools subsection (installation first) - 6/27/2024
[ ] GIF of how you would use general purpose ML library - 6/27/2024

bystrogenomics / bystro

Sprint 13 Task List #526