cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
78 stars 35 forks source link

FeatureRequest: address Pandas DataFrame fragmentation warnings from tests #404

Closed d33bs closed 6 months ago

d33bs commented 6 months ago

Feature type

General description of the proposed functionality

Pandas warns about DataFrame fragmentation when running pytest which could impact performance within Pycytominer. This issue highlights the need to make changes related to this to avoid these warnings.

Referenced from this #401 GH Actions job run:

tests/test_cyto_utils/test_DeepProfiler_processing.py: 42 warnings
  /home/runner/work/pycytominer/pycytominer/pycytominer/aggregate.py:108: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    population_df = population_df.median().reset_index()

tests/test_cyto_utils/test_DeepProfiler_processing.py: 42 warnings
  /home/runner/work/pycytominer/pycytominer/pycytominer/cyto_utils/DeepProfiler_processing.py:276: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    df.loc[:, self.aggregate_merge_col] = metadata_level

...

tests/test_cyto_utils/test_collate.py: [76](https://github.com/cytomining/pycytominer/actions/runs/8623534172/job/23636904607?pr=401#step:6:77) warnings
  /home/runner/work/pycytominer/pycytominer/pycytominer/aggregate.py:110: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    population_df = population_df.mean().reset_index()

Feature example

n/a

Alternative Solutions

n/a

Additional information

n/a