cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
76 stars 34 forks source link

Set epsilon to zero in `RobustMAD` #161

Open niranjchandrasekaran opened 3 years ago

niranjchandrasekaran commented 3 years ago

@gwaygenomics You may already know that we sometimes have had issues(private repo) with mad_robustize where one or few features end up being significantly larger than the other features, and then it affects the downstream metrics that we compute. @shntnu said this behavior can be overcome by setting the epsilon variable to 0 instead of 1e-18 that it is set to, by default. Currently, epsilon cannot be set while calling normalize() but we could do one of the following

  1. Change the default value of epsilon. This is the easiest option, but it is possible that it will break something.
  2. Allow setting epsilon while calling normalize().

Which one would you suggest?

gwaybio commented 3 years ago

I think we can do both! If you think the default should be zero, then go for it! I do not think it will change things in the short term, but we might want to do something specifically with divide-by-zero errors that the epsilon tries to overcome in the long term.

I think allowing a user to set epsilon in normalize makes sense. We had wanted to do this with spherize (see a similar discussion in #116) as well, but this change is different (and more complicated) than the one you're proposing for RobustMad.

gwaybio commented 3 years ago

Another option is to keep the epsilon default in pycytominer, add it as an option to normalize, and then make the default 0 in the recipe.

Up to you to decide which option is best, happy to review a PR

shntnu commented 3 years ago

Another option is to keep the epsilon default in pycytominer, add it as an option to normalize, and then make the default 0 in the recipe.

💯 I like this option the best (assuming I recollect the APIs right, didn't check again)

gwaybio commented 1 year ago

@shntnu - given you comment above

I like this option the best (assuming I recollect the APIs right, didn't check again)

Can we close this issue without making changes? It seems we have this existing solution - well, maybe actually (cc @carmendv https://github.com/broadinstitute/cellprofiler-on-Terra#40)

gwaybio commented 1 year ago

@niranjchandrasekaran made this change in https://github.com/cytomining/pycytominer/commit/5420d0197ee7bb09c246da0b69bb54b7c094c09c