cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
78 stars 35 forks source link

Improve Documentation for Non-CellProfiler Datasets in Pycytominer #430

Closed axiomcura closed 1 month ago

axiomcura commented 1 month ago

Description

Thank you for your contribution to pycytominer! Please succinctly summarize your proposed change. What motivated you to make this change?

This PR updates the README with instructions on how to handle morphological features extracted from software other than CellProfiler. These changes were conducted due to a reviewer’s comment/suggestion.

"When I used the function pycytominer.normalize() on a dataset originating from my own work, I encountered an error message saying “No CP features found. Are you sure this dataframe is from CellProfiler?”. I could easily fix the problem because I know the properties of a CellProfiler dataset, and the error message was hinting at some “Metadata” missing. The fix was very easy, I just needed to add the prefix “Metadata” to my metadata columns. My dataset comes from another commonly used proprietary software called Harmony. I understand that the authors are in favor of using open-source resources, but I think that the pycytominer tool could be useful in a much larger context, and take as input different types of datasets. In the readme file, that I followed to perform the normalization, I couldn’t find anywhere that the dataset has to come from CellProfiler. The author should state clearly, that only cellprofiler datasets can be used. Instead, currently, the sentence in the readme files of the pycytominer github repository says “Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler)”. And in the code chunck the sentence says “Each function takes as input a pandas DataFrame or file path”, but in reality, it has to be a specifically formatted pandas dataframe (at least regarding the column names). I think the authors should fix this misunderstanding in the documentation. Alternatively, the better option would be to support the use of different datasets, and explain how the column names should be formatted to be easily processed by the different functions. I can’t exclude that this is described somewhere in the tutorials, but I didn’t encounter it when following the instructions on the readmefile. I would also make the error message more self-explanatory because for now, it implies that I know what CellProfiler is and how its output is formatted. If I don’t have this knowledge, as a user, I might give up on using this tool. This would be unfortunate, as I believe that this tool is a fundamental step in image-based profiling. Considering how easy was to fix this issue for my own dataset, which allowed me to immediately see the outputs of your tool, I recommend reconsidering how to best document the formatting steps necessary to be able to use the different functions of the pycytominer too"

Please also link to any relevant issues that your code is associated with.

What is the nature of your change?

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

codecov-commenter commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.72%. Comparing base (409d2aa) to head (6caab11). Report is 8 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #430 +/- ## ========================================= + Coverage 0 94.72% +94.72% ========================================= Files 0 57 +57 Lines 0 3145 +3145 ========================================= + Hits 0 2979 +2979 - Misses 0 166 +166 ``` | [Flag](https://app.codecov.io/gh/cytomining/pycytominer/pull/430/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/cytomining/pycytominer/pull/430/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining) | `94.72% <ø> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gwaybio commented 1 month ago

@axiomcura - please take a look at my changes. They aim to simplify and clarify. I also removed the non-CellProfiler example that actually used CellProfiler data. This is too confusing and I think the updated function docstrings and error messages in #448 are sufficient. Lastly, I removed the table of contents which was in a strange place and I felt added more confusion than help

axiomcura commented 1 month ago

I like the changes @gwaybio! I agree that having the table of contents in the middle of the README was a bit confusin. It's much cleaner and more readable now.

@d33bs, waiting on your approval if the updated documentation is ready to be merged! c:

gwaybio commented 1 month ago

It looks like this is a duplicate of #448 ? @axiomcura - is this true? How are these different?

Taking a look at what's changed here, it looks like #448 is the correct PR and we should close this one.

gwaybio commented 1 month ago

(I'll plan on going through @d33bs comments and adding them to 448)

gwaybio commented 1 month ago

oh wait! I am the one confused. One sec

gwaybio commented 1 month ago

Thanks for the review @d33bs !

One thing I left for @axiomcura to address is this comment:

I noticed the image of the workflows could use maybe another block alongside CellProfiler (or some other method) to indicate the freedom to use other tools as desired (at the moment it visually communicates you can only use two tools to reach the workflows). Maybe there could be a mention of "Single-cell Profiling Software (e.g. CellProfiler, DeepProfiler, etc.)" (this is a bit too long but I think you can see what I mean!). See below:

@axiomcura - what do you think about dropping this figure in favor of your new figure 1? I propose that you rename the existing file (maybe to something like pipeline_legacy.png) and add a high res version of figure one (naming it pipeline.png)

gwaybio commented 1 month ago

update, looks like macos runners aren't picking up the jobs. Since ubuntu and other integration tests passed and this PR doesn't touch code, I will merge to reduce delays