FredHutch / gimap

This is under development R package for calculating genetic interactions
https://fredhutch.github.io/gimap/
0 stars 0 forks source link

gimap suggested structure and schema #22

Closed cansavvy closed 2 months ago

cansavvy commented 3 months ago
# Description This PR is meant to rehash the overall schema for this code -- how do we see this code being used? Keep in mind this will certainly change somewhat from what I have set up here, because the details of the code have not yet been accounted for. But this is meant to be a starting point for discussion so we can start with an idea of how users would use gimap -- and adjust as needed when the development of the functions is actually being done. This schema and usage is inspired by popularly used bioinformatics R packages: - We'd like to take [experimental design information like DESeq2](https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) does this and it returns test results! - [SingleCellExperiments](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) store and transform data but in a way that has a specialized dataset structure -- here we are trying to do the same idea. This allows users to not have to keep track of all the different pieces of information but still allows them and us to access it when the next step needs to be performed. - The tidyverse allows people to construct pipelines by using `%>%` or more recently `|>` this schema also allows people to do this as well since resaving the data at every step isn't always necessary. - Much like how [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) works, the qc step doesn't make actual changes to the data but prints out a report that users should be highly encouraged to consult, but not required to use if they don't want to. ## Here's the big picture idea: These are from https://docs.google.com/presentation/d/1J-S6UpYB2IADRekObkM3HI4qEdsWvR9en780uNwiD0c/edit#slide=id.p ![gimap schema](https://github.com/FredHutch/gimap/assets/23458084/376f668d-4918-40b1-b066-fbb2887dff73) ![gimap schema (1)](https://github.com/FredHutch/gimap/assets/23458084/7b58205a-d091-4bab-9643-29e5e32580d7) ## What does this mean for users? In this idea the usage would look like this for a user: ``` # USERS SET UP THEIR DATA (this would be variable but use setup_data() # Highly recommended but not required run_qc(gimap_dataset) gimap_dataset <- gimap_dataset %>% gimap_filter() %>% gimap_annotate() %>% calc_lfc() %>% calc_gi() ``` Each of the steps would optionally have arguments at each of them, but this would be the simplest version of the pipeline. ## Which steps are required? - Setup is absolutely required. - QC is optional but recommended. - Filtering is *technically* optional but again recommended. - Annotation is required IF log fold change is to be calculated. - Log fold change is required if Genetic interactions are to be calculated
kweav commented 3 months ago

Big picture/schema/usage/description all look good to me -- thanks!

howardbaek commented 3 months ago

Looks good to me!