frhl / genoppi-v4

Genoppi: an open-source software for robust and standardized integration of proteomic and genetic data
MIT License
0 stars 0 forks source link

Minimal functionality: Input checking and basic analysis #1

Closed frhl closed 4 years ago

frhl commented 4 years ago

We need the following for minimal functionality:

Input checking: Input format checking for four different types of input.

We should have a function that checks format. E.g. separate replicates, we call a moderated t-test. We should get a list of 1) input data.frame, 2) and a boolean vector indicating/string indicating what further functions should be called to process the data. In summary, read input, check input, map input, moderated t.test and identify enriched interactors.

Include, verbose error message that we can use for debugging and potentially for users in the future.

R error handling: stopifnot('FDR' %nin% colnames(df))

Function details

  1. Reading. Input: filename/path. output: list(data.frame, booleans: needmap, boolean: needtest)
  2. Read input. Input: input data.frame. output: list(data.frame in which first column contains mapped genes, data.frame: accession number to gene mapping).
  3. moderated t.test: input: data.frame. output: data.frame with three collumns attached, LogFC, FDR, P-value)
  4. identify enriched interactions. Input: df, logFc cutoff, logfc_direction, pvalue (NULL), fdr. output: list( (3) but with attach boolean significant column, data.frame with interactors and non-interactors that can be downloaded by the user)
yuhanhsu commented 4 years ago

DONE on March 12: (1) check_input (2) read_input

TO DO on March 12: (3) map_geneid (FHL) (4) calc_mod_ttest (YHH) (5) id_enriched_proteins (YHH)

yuhanhsu commented 4 years ago

YHH TO DO (Mar 13):

yuhanhsu commented 4 years ago

TO DISCUSS:

yuhanhsu commented 4 years ago

*** maybe in the future, show isoform information in plots

Allow duplicate gene names, only check for unique names if doing overlap enrichment type tests.