Open adamklie opened 3 weeks ago
The config file for snakemake pipeline has separate paramters for P2G and E2G links and the pipeline would store outputs for these two separately with appropriate naming. The motif enrichment code itself will now accept any genomic coordinates mapped to genes and run the enrichment without requiring a "class" column in the input or the user to specify a specific class.
I would prefer 2. for the dashboard since the idea would be to use the pipeline outputs as the default input for the dashboard.
The code needs to be further modified to not expect a seq_class column to bring it in line with standard E2G output formats and therefore not require the user to manually add this column before running our pipeline.
Here is our proposed format for output files from this step:
{prog_key}_{E/P_type}_{database}_{test_type}_{stratification_key}_{level_key}_enrichment.txt
For the jamboree we will just run individually and save as such. Main idea here is to have a separate file for each that the dashapp will load in. There will be several dropdowns for the user to choose between what they want to visualize
I've also implemented something I think we should discuss at some point. For convenience, I adjusted pvals after calculating all the pearson tests across programs. Would it be better to do FDR correction at a program level instead?
We used to have a column for this in the output file for motif enrichment, but that is not there in the latest outputs (presumably because E2G links weren't there yet).
How do we want to handle this more generally? The two ways I could see for the dashboard:
type
column for this regardless of what enrichment is run on.1) seems better and more flexible, but either will work.