Open niranjchandrasekaran opened 3 years ago
I'm revisiting this now, since I'm adding epsilon to normalize.py
in #132
@niranjchandrasekaran - I think this enhancement is cool, but it is beyond scope of #132. Once we merge #132, then we can tackle this, if it becomes necessary.
My overall strategy would be to add a new file - something like pycytominer.cyto_utils.normalize_utils.py
where you would write the function estimate_epsilon_regularization()
.
We then can enable the option normalize(spherize_epsilon="auto")
in the normalize function. I am not sure if we want to change spherize_epsilon
to default to auto.
The only other thing I would say is that we should do our best to avoid any additional dependencies. We were burned in the past with deprecated packages (example cytomining/cytominer-database#108) and we shouldn't introduce dependencies that we only really would ever use in rare occasions. If this comes up again as important, then would it be possible to avoid using kneed?
We could implement a simple version of kneed ourselves that needs to work only with eigenvalue curves. But I haven't read the paper so I am not sure how easy it will be - https://raghavan.usc.edu//papers/kneedle-simplex11.pdf
@gwaygenomics LMK if my pondering this will help unblock the profiling comparison paper. I'll move it out of my inbox, but ping me if it becomes relevant.
The current implementation of sphering in pycytominer uses a constant value (
1e-6
) for the regularization parameterepsilon
.https://github.com/cytomining/pycytominer/blob/a5dac9e3fa3cdf61e9607f479ba53eac7fed18b1/pycytominer/operations/transform.py#L25
Sphering performance may improve if the value of
epsilon
is determined directly from the data using @shntnu's approach whereepsilon
is one-tenth the eigenvalue at the knee of the the eigenvalue curve.Here is crude implementation of this approach that I wrote used the kneed package. We may want to rewrite and add it to the sphering method in pycytominer.