artedison / Edison_Lab_Shared_Metabolomics_UGA

Official Edison Lab toolboxes and scripts for analyzing metabolomics data.
13 stars 4 forks source link

scale() often returns nans, and therefore cannot work with many downstream functions #2

Open judgemt opened 4 years ago

judgemt commented 4 years ago

The issue was zero divisors from std() calculations as well as mean calculations in several of the methods available in scale().

Solved by adding the following after the switch statement to clean up any nans and convert them to zero values (reasoning is that they are not relevant anyways):

XS(isnan(XS)) = 0; % MTJ edit 20200723: nans result from zero vals

mikeaalv commented 4 years ago

Thank you for making that.

  1. Is this 'bug' in both public and private toolbox? Have you made the changes (in which repository), can you link the commit you made the changes to this issue page?
  2. Why do you think 0 is a value that will works for all cases? Should we instead give an example:
scale(X,method);
X(insan(X))=0%or other value of choice

I don't know whether people will always want 0 here. They may want some other baseline values(or mean)? I guess need a discussion here (as I'm not often dealing with preprocessing).

judgemt commented 4 years ago

Edit made in both repos.

Since the scaled matrix is always mean-centered before that step (at least in all of our methods), it is actually replacing with the mean. If the sqrt(std( feature )) = 0, then this spectral point shouldn't have weight in the analysis, right?

judgemt commented 4 years ago

I've only ever seen the scaled matrix used in multivariate analyses like PCA, PLS-DA

mikeaalv commented 4 years ago

it sounds great. it is related to commit 2826f123c534e1436c52e2d5a21f47c0b6755c62 on public toolbox. We might need to add an representative example for such function (and maybe unit test, so that the correctness of commit can be checked directly). Feel free to close this issue.

gjg001 commented 4 years ago

since we are at it, should we change the name of scale? it has the same name as a matlab function, however, it is dangerous for past workflows... or best course of action is to leave it alone...?

mikeaalv commented 4 years ago

@gjg001 I would suggest change the name (also if it is used by other function), and record the change (and broadcast in slack). We can show the changes in new template workflows.

judgemt commented 4 years ago

@mikeaalv we could also suggest easy ways to go in an replace those function calls with the new ones

mikeaalv commented 4 years ago

@mikeaalv we could also suggest easy ways to go in an replace those function calls with the new ones

@judgemt Sorry don't understand what you are saying ;->

judgemt commented 4 years ago

Write a script to update workflows for renamed functions (if there are others); i.e. give us your workflow and we'll update it

mikeaalv commented 4 years ago

@judgemt we definitely need to make sure the changes propagate to all functions and script in the toolbox repository. This should be not too hard and definitely need manual confirmation. In terms of everyone's script, that might depends on the author (we will let them know we made the changes).