cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
76 stars 34 forks source link

Modify normalize.py for easier usage of whitening method #96

Closed AdeboyeML closed 3 years ago

AdeboyeML commented 3 years ago

suggested approach:

for example: whiten_method_list = ['ZCA', 'PCA', 'PCA-cor', 'ZCA-cor'] that means: if (whiten_method != None) and (whiten_method in whiten_method_list): method = 'whiten'

gwaybio commented 3 years ago

Thanks for opening this issue @AdeboyeML - I am looking forward to your contributions!

So whiten_method will be by default None, once a whiten_method is given by the user and is one of the methods we have in whiten method list, automatically method is set to 'whiten'.

I like this approach in principle, but we should also do one other thing to avoid a potentially disastrous scenario.

Disastrous scenario

Imagine that a user copy and pasted code from one repo to another. In the first repo, the original code was:

normalize(
    profiles=anno_df,
    samples="all",
    method="whiten",
    whiten_method="ZCA"
)

However, the user wanted to switch from whitening to standardizing. If the user switched only this:

normalize(
    profiles=anno_df,
    samples="all",
    method="standardize",
    whiten_method="ZCA"
)

Then the code would default to method="whiten"? This would be bad!!

W would want the internal normalize() function to at least warn the user that a different method was provided than what is being used.

Alternative approach

One alternative approach could be to set whiten_method to a reasonable default that we determine in https://github.com/broadinstitute/lincs-cell-painting/issues/38#issuecomment-696308748

In this case, we need to make it clear in the normalize() docstring that the whitening options are only used when method="whitening".

AdeboyeML commented 3 years ago

Yes, the alternative approach is more elegant, we should give the user the preference to always set method to their desired normalization method.

gwaybio commented 3 years ago

Let's go with ZCA-cor as default for pycytominer

gwaybio commented 3 years ago

see https://github.com/broadinstitute/lincs-cell-painting/issues/38#issuecomment-697920393 for the decision.

We'll close this issue once we make the software fix and it passes old (and potentially new) tests.

@AdeboyeML - do you want to make this fix?

AdeboyeML commented 3 years ago

Yes, I will make the fix.

AdeboyeML commented 3 years ago

I have changed the whiten_method default to "ZCA-cor" and I also ran the test_normalize.py on my local machine and it passed the test.

gwaybio commented 3 years ago

Thanks for opening the PR in #97 @AdeboyeML ! The best kind of PRs are the ones that make minimal changes.

I didn't realize that we basically already have the alternate scenario as described above https://github.com/cytomining/pycytominer/issues/96#issuecomment-696358580

I think we should go ahead and merge #97 (I will do this) but we should still keep this issue open. We should keep it open because my alternative solution isn't really that satisfying...

there must be some happy medium between the current method (still error prone) and the suggestion you list above:

Modify normalize.py script to allow users use the whitening method once a whiten_method is given i.e. once whiten_method = ZCA/PCA/PCA-cor/ZCA-cor without the need to state method = whiten

Can you think of a better way? Maybe splitting out whitening from normalize.py? Maybe removing the the whiten_method all together and create four different whitening options? (e.g. method="whiten_ZCA") 🤷

AdeboyeML commented 3 years ago

Yes, maybe we could split out whitening from normalize.py and create a new function called whiten , where the only parameter the user has to input will be the whiten_method, this may be more straightforward

gwaybio commented 3 years ago

sounds good - let's put this "on the shelf" for now.

Edit: idiom 2 in the updated link