Closed AdeboyeML closed 3 years ago
Thanks for opening this issue @AdeboyeML - I am looking forward to your contributions!
So whiten_method will be by default None, once a whiten_method is given by the user and is one of the methods we have in whiten method list, automatically method is set to 'whiten'.
I like this approach in principle, but we should also do one other thing to avoid a potentially disastrous scenario.
Imagine that a user copy and pasted code from one repo to another. In the first repo, the original code was:
normalize(
profiles=anno_df,
samples="all",
method="whiten",
whiten_method="ZCA"
)
However, the user wanted to switch from whitening to standardizing. If the user switched only this:
normalize(
profiles=anno_df,
samples="all",
method="standardize",
whiten_method="ZCA"
)
Then the code would default to method="whiten"
? This would be bad!!
W would want the internal normalize()
function to at least warn the user that a different method was provided than what is being used.
One alternative approach could be to set whiten_method
to a reasonable default that we determine in https://github.com/broadinstitute/lincs-cell-painting/issues/38#issuecomment-696308748
In this case, we need to make it clear in the normalize()
docstring that the whitening options are only used when method="whitening"
.
Yes, the alternative approach is more elegant, we should give the user the preference to always set method
to their desired normalization method.
Let's go with ZCA-cor
as default for pycytominer
see https://github.com/broadinstitute/lincs-cell-painting/issues/38#issuecomment-697920393 for the decision.
We'll close this issue once we make the software fix and it passes old (and potentially new) tests.
@AdeboyeML - do you want to make this fix?
Yes, I will make the fix.
I have changed the whiten_method default to "ZCA-cor" and I also ran the test_normalize.py on my local machine and it passed the test.
Thanks for opening the PR in #97 @AdeboyeML ! The best kind of PRs are the ones that make minimal changes.
I didn't realize that we basically already have the alternate scenario as described above https://github.com/cytomining/pycytominer/issues/96#issuecomment-696358580
I think we should go ahead and merge #97 (I will do this) but we should still keep this issue open. We should keep it open because my alternative solution isn't really that satisfying...
there must be some happy medium between the current method (still error prone) and the suggestion you list above:
Modify normalize.py script to allow users use the whitening method once a whiten_method is given i.e. once whiten_method = ZCA/PCA/PCA-cor/ZCA-cor without the need to state method = whiten
Can you think of a better way? Maybe splitting out whitening from normalize.py
? Maybe removing the the whiten_method
all together and create four different whitening options? (e.g. method="whiten_ZCA"
) 🤷
Yes, maybe we could split out whitening from normalize.py and create a new function called whiten
, where the only parameter the user has to input will be the whiten_method
, this may be more straightforward
sounds good - let's put this "on the shelf" for now.
Edit: idiom 2 in the updated link
whiten_method = ZCA/PCA/PCA-cor/ZCA-cor
without the need to statemethod = whiten
suggested approach:
whiten_method
will be by defaultNone,
once a whiten_method is given by the user and is one of the methods we have in whiten method list, automatically method is set to 'whiten'.