[x] calc_fisher function. Input: df, listName, listDf, inetersectN, bait. Output: list of data.frame (list_name, overlap_count, dfOnly_count, listOnly_count, neither_count, pvalue) and list (overlap_genes, dfOnly_genes, listOnly_genes, neither_genes).
[x] calc_hyper function. Input: same as above. Output: list of data.frame (list_name, successInSample_count (x), sample_count (n), notSample_count (N-n), success_count (k), pvalue) and list (successInSample_genes, sample_genes, notSample_genes, success_genes).
Functions for creating listDf used above:
[x] get_inweb_list function. Input: bait. Output: df containing gene and significant columns for all InWeb genes (significant=T for InWeb interactors of bait).
[x] get_gene_list function. Input: gene list file path. Output: list of (df containing gene and significant columns, intersectN). If input file contains no significant column, significant=T for all rows and intersectN=F.
Functions for reading in other overlay data:
[x] get_snp_list function. Input: SNP list file path, vector of gene names (genes in input proteomic data). Output: df containing gene and SNP columns. Each gene or SNP can appear more than once. Use SNP-to-gene mapping data to identify genes to include in df.
[x] get_gwas_list function. Input: vector of GWAS catalog traits, vector of gene names. Output: list of (df containing gene and SNP columns, df containing GWAS catalog info for selected SNPs). Use GWAS catalog data to get SNP list, then use SNP-to-gene mapping data to identify genes to include in df.
Do we want to allow multiple gene lists in one input file? If so, need to modify get_gene_list function to account for this. I think it might be easier to restrict to one gene list per file (this way an optional "significant" column can be included if appropriate), and in shiny just allow the user to upload multiple (up to a maximum number, e.g. 5) gene list files.
I'm skipping error messages (e.g. stopifnot('FDR' %nin% colnames(df))) for now, probably need to revisit later especially for the get_* functions
[x] see Issue #6 for additional changes for: get_snp_list, get_gene_list, and get_gwas_list to account for multiple lists. Rename to: get_snp_lists, get_gene_lists, and get_gwas_lists
[x] modify calc_fisher and calc_hyper to account for new input format (data frames from get_gene_lists)
[ ] add test cases in corresponding testthat scripts to test input with multiple lists
YHH TO DO:
Enrichment functions:
Functions for creating listDf used above:
Functions for reading in other overlay data: