broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Pandas-2-ize the recipe #94

Open bethac07 opened 1 year ago

bethac07 commented 1 year ago

The following lines at least are barfing:

Uncaught Exception:   File "recipe/0.preprocess-sites/1.process-spots.py", line 289, in <module>
    spot_count_score_jointplot(

  File "/home/ubuntu/efs/2018_11_20_Periscope_Calico/workspace/software/M059K-SABER/recipe/0.preprocess-sites/scripts/spot_utils.py", line 30, in spot_count_score_jointplot
    pd.DataFrame(df.groupby(parent_col)[score_col].mean())

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/frame.py", line 9843, in merge
    return merge(

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 148, in merge
    op = _MergeOperation(

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 737, in __init__
    ) = self._get_merge_keys()

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1203, in _get_merge_keys
    right_keys.append(right._get_label_or_level_values(rk))

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/generic.py", line 1778, in _get_label_or_level_values
    raise KeyError(key)

and,if you disable that qc plot,

Uncaught Exception:   File "recipe/0.preprocess-sites/1.process-spots.py", line 340, in <module>
    cell_quality_summary_df = cell_quality.summarize_cell_quality_counts(

  File "/home/ubuntu/efs/2018_11_20_Periscope_Calico/workspace/software/M059K-SABER/recipe/scripts/cell_quality_utils.py", line 107, in summarize_cell_quality_counts
    quality_df.drop_duplicates(dup_cols)

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/frame.py", line 9843, in merge
    return merge(

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 148, in merge
    op = _MergeOperation(

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 737, in __init__
    ) = self._get_merge_keys()

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1221, in _get_merge_keys
    left_keys.append(left._get_label_or_level_values(lk))

  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.8/site-packages/pandas/core/generic.py", line 1778, in _get_label_or_level_values
    raise KeyError(key)

Those are both downstream of a .value_counts() operation on a df, which was one of the breaking changes in 2 (in 2, the name of the column coming from such an operation is always set to "count"). There are currently 3 functions using value_counts.

Very much hope that's the only changes that need to be made, but we should recommend pandas 1.5.3 until someone goes through and actually successfully runs a >pandas 2 version.