Open niranjchandrasekaran opened 3 years ago
I think we can do both! If you think the default should be zero, then go for it! I do not think it will change things in the short term, but we might want to do something specifically with divide-by-zero errors that the epsilon tries to overcome in the long term.
I think allowing a user to set epsilon in normalize makes sense. We had wanted to do this with spherize (see a similar discussion in #116) as well, but this change is different (and more complicated) than the one you're proposing for RobustMad.
Another option is to keep the epsilon default in pycytominer, add it as an option to normalize, and then make the default 0 in the recipe.
Up to you to decide which option is best, happy to review a PR
Another option is to keep the epsilon default in pycytominer, add it as an option to normalize, and then make the default 0 in the recipe.
💯 I like this option the best (assuming I recollect the APIs right, didn't check again)
@shntnu - given you comment above
I like this option the best (assuming I recollect the APIs right, didn't check again)
Can we close this issue without making changes? It seems we have this existing solution - well, maybe actually (cc @carmendv https://github.com/broadinstitute/cellprofiler-on-Terra#40)
@niranjchandrasekaran made this change in https://github.com/cytomining/pycytominer/commit/5420d0197ee7bb09c246da0b69bb54b7c094c09c
@gwaygenomics You may already know that we sometimes have had issues(private repo) with
mad_robustize
where one or few features end up being significantly larger than the other features, and then it affects the downstream metrics that we compute. @shntnu said this behavior can be overcome by setting theepsilon
variable to0
instead of1e-18
that it is set to, by default. Currently,epsilon
cannot be set while callingnormalize()
but we could do one of the followingepsilon
. This is the easiest option, but it is possible that it will break something.epsilon
while callingnormalize()
.Which one would you suggest?