cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
80 stars 36 forks source link

Update error message and docs for `features` argument to clarify CellProfiler default expectations and how to handle non-CellProfiler data #448

Closed axiomcura closed 1 month ago

axiomcura commented 2 months ago

Description

Thank you for your contribution to pycytominer! Please succinctly summarize your proposed change. What motivated you to make this change?

Summary

This PR addresses one of the reviewers’ comments:


" When I used the function pycytominer.normalize() on a dataset originating from my own work, I encountered an error message saying

“No CP features found. Are you sure this dataframe is from CellProfiler?”.

I could easily fix the problem because I know the properties of a CellProfiler dataset, and the error message was hinting at some “Metadata_” missing.

The fix was very easy, I just needed to add the prefix “Metadata_” to my metadata columns. My dataset comes from another commonly used proprietary software called Harmony. I understand that the authors are in favor of using open-source resources, but I think that the pycytominer tool could be useful in a much larger context, and take as input different types of datasets. In the readme file, that I followed to perform the normalization, I couldn’t find anywhere that the dataset has to come from CellProfiler. ..."


I replaced the ‘assert’ statement with an exception, as it’s considered bad practice to use assertions during runtime. Assertions can be disabled with the -o parameter, which means they aren’t reliable for handling errors in production code. Here are some useful links:

Additionally, I updated the error message to clarify that pycytominer infers CellProfiler features by default and provided guidance on how users can manually specify their feature space when working with non-CellProfiler data.

Please also link to any relevant issues that your code is associated with.

This code is also an addition to #430

What is the nature of your change?

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

gwaybio commented 2 months ago

@axiomcura - integration tests failed, please fix. Looks like the tests expect a specific error message which also needs updating

codecov-commenter commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.72%. Comparing base (c6ad2a0) to head (5f4ac6e).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #448 +/- ## ======================================= Coverage 94.72% 94.72% ======================================= Files 57 57 Lines 3145 3146 +1 ======================================= + Hits 2979 2980 +1 Misses 166 166 ``` | [Flag](https://app.codecov.io/gh/cytomining/pycytominer/pull/448/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/cytomining/pycytominer/pull/448/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining) | `94.72% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

axiomcura commented 2 months ago

@gwaybio All tests have passed! Please review when you have a moment to confirm if it's safe to merge.

gwaybio commented 2 months ago

Thanks Erik! I see that the tests are passing now.

Re-reading the reviewer comment, it seemed that they have issues with the metadata parameter.

I'm thinking that in addition to clarifying the features exception, we should also do something to the metadata parameter here: https://github.com/cytomining/pycytominer/blob/main/pycytominer/cyto_utils/features.py#L113C1-L116C19

Perhaps we can do two more things in this PR to update error message in the metadata inference.

  1. Add an exception (and corresponding tests) for if metadata_features > 0
  2. Add a description to the metadata parameter docstring specifying exactly what metadata="infer" actually expects/does

I'll also tag @d33bs - Dave, after Erik addresses these two points, perhaps you can take a final look to see if both Erik and I overlooked something in making this change. Thanks!

axiomcura commented 2 months ago

@gwaybio - This is certainly doable, but I have a few clarifying questions. Could you please explain why the exception condition needs to be metadata_features > 0?

From my understanding, Pycytominer’s main functions default to metadata_features="infer", which always expects metadata features from CellProfiler. As a result, adding an exception for metadata_features > 0 in infer_cp_features() seems to introduce a contradictory logic. My guess would have been metadata_features == 0.

I might be misunderstanding something where the metadata_features > 0 exception should be implemented. I just want to make sure if I am thinking of this correctly before I implement! Thanks!

gwaybio commented 2 months ago

Great catch Erik! We should not check metadata features in the way I suggested above. We ask users to provide metadata features within each function (e.g., normalize()) call.

Perhaps the only change should be whereever we see meta_features="infer" in the docstring (e.g., here: https://github.com/cytomining/pycytominer/blob/main/pycytominer/normalize.py#L41)

We add some specification to clarify that the cases in which meta_features="infer" works as expected is not only when metadata features are prefixed by Metadata_ but that this occurs when using CellProfiler-derived input.

I think this is a relatively minor and straightforward fix.

  1. Find wherever meta_features="infer" is used.
  2. Modify the docstring slightly to indicate CellProfiler-derived default
axiomcura commented 2 months ago

@gwaybio - I have updated the docstrings.

kenibrewer commented 1 month ago

Ahh shucks... there's going to be a lot of merge conflicts between this branch and #396. I was going to finish that one off this weekend. Not sure which order we want to merge / resolve merge conflicts in.

gwaybio commented 1 month ago

Bummer! My sense is:

  1. Merge #448 (clarify CellProfiler defaults)
  2. Merge #439 (mypy)
  3. Merge #396 (mkdocs)

We'll likely do 1 and 2 before the weekend FWIW

kenibrewer commented 1 month ago

Sounds good! Let's do that!

gwaybio commented 1 month ago

@d33bs - ready for you to take a final look, thanks!

axiomcura commented 1 month ago

@d33bs @gwaybio I’ve finished applying the requested changes. I’m currently waiting for the tests to pass, and once they do, it’s ready for merging. ( I am unable to merge)

gwaybio commented 1 month ago

merged!