cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
76 stars 34 forks source link

Bug: `infer_cp_features()` Function Fails to Capture Features If the Compartment Name Contains Multi-Case Letters #412

Closed axiomcura closed 2 months ago

axiomcura commented 4 months ago

Example code with output

I'm currently working with a dataset that doesn't utilize CellProfiler features. As a result, it has different compartments:

compartments = ['Primary', 'Cells', 'F Actin', 'Mitochondria', 'MyoD']

What's neat about pycytominer's infer_cp_features() function is that you can specify the compartments from which you want to select morphological features using the compartments parameter:

from pycytominer.cyto_utils import infer_cp_features

compartments = ['Primary', 'Cells', 'F Actin', 'Mitochondria', 'MyoD']
incarta_features = infer_cp_features(incarta_df, compartments)

In the generated feature list, all the features within the MyoD compartment are missing. This indicates that the infer_cp_features() function is not capturing to capture the MyoD compartment.

Issue description

Feature names within the MyoD compartment are not captured.

Expected behavior

The expected behavior is to receive all features from all compartments.

Additional information

pycytominer version: 1.0.1

After some digging within the source code, I noticed that infer_cp_features() utilizes the convert_compartment_format_to_list() function. This function converts all compartment names to lowercase, except for the first letter, which causes MyoD to turn into Myod.

While I recognize that this functionality is tailored specifically for CellProfiler features, it presents an exciting opportunity to consider generalizing it for other features that have been extracted from other technologies!