bigomics / omicsplayground

Visual self-service analytics platform for big omics data.
http://www.bigomics.ch
Other
118 stars 35 forks source link

document if "continuous" variable names like `age` and `height` are accepted in OPG #518

Open mauromiguelm opened 1 year ago

mauromiguelm commented 1 year ago

document if "continuous" variable names like age and height are accepted in OPG

mauromiguelm commented 1 year ago

I tested age and height, three continuous variables as in the table below

Upload module:

image

sample information dataview:

image

Results:

Continuous variables get kicked out from the phenotype selection.. is this correct @ivokwee, should we somehow force dichotomize it? Based on the feedback I will update documentation and do the changes.

  1. Clustering

    image
  2. Correlation analysis

    image
ivokwee commented 1 year ago

In the pgx object there used to be, or still is, pgx$samples and pgx$Y. The latter should be a cleaner and discretized version of the former. I try to keep pgx$samples as original as possible. But I failed to use this consistently. The nicest thing would be in the UI to check and ask to discretize continuous variables: 2-3-4 groups. Ideally we should be able to handle statistics for CV. But now you have to discretize or use them solely for visualization. Would be nice also to be able to plot CV against other vars (some people asked that).

ivokwee commented 1 year ago

Yes. There is code somewhere that tries to intelligently detect CV or Discrete var. Not always robust. But such a code is very important for creating auto contrasts.

mauromiguelm commented 1 year ago

Follow up:

  1. The phenotype association plot uses pgx$samples, which still has the continuous values. annot <- pgx$samples

    image
  2. pgx$Y on the other hand, is lacking the discretized values.

    image

It seems pgx$Y, for some reason (bug?) is not discretizing continuous variables anymore.

Two solution that comes to mind here:

  1. Do it at the pgx check level, where we discretive ourselves and tell the user we will discretize these values. This is very easy to do in crosscheckInput function. We could even create new columns age > age_discrete in pgx$samples, but that would not keep pgx$samples clean anymore.

  2. Find the bug on why pgx$Y is not being discretized, or add that functionality. But since pg$Y is created inside pgx.compute (or initialize), its way more difficult to inform the user continues variables will be discretized, compared to doing it in the check functions, where we already have all systems in place to tell user and make changes to inputs.

I am not sure which of the options I should go for, but it seems important that if age, height or any random_continuous variables are removed from the phenotypes, we should somehow tell the user or fix it,

Cheers, M

mauromiguelm commented 1 year ago

@ivokwee @ncullen93 this is what we discussed today, how should we proceed?

ncullen93 commented 1 year ago

Right, as Ivo said I think the best thing is to keep pgx$samples completely untouched after upload, and pgx$Y should be what is changed. We will have to go through all the code to ensure that though.

mauromiguelm commented 1 year ago

It seems pgx$Y, for some reason (bug?) is not discretizing continuous variables anymore.

we first need to understand why pgx$Y is not showing continuous values, any idea how much work updating pgx$samples to pgx$Y will require?