broadinstitute / grit-benchmark

Benchmarking a metric used to evaluate a perturbation strength
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Grit should handle multiple groupings #12

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

The following error indicates that grit should be calculated per perturbation. Cytominer eval should be aware of the "group_id" structure and enable multiple groups (as opposed to only a single group allowed now).

With the single group option, we need to calculate grit for each group independently.


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-61c106f993e9> in <module>
      6 }
      7 
----> 8 grit_results_df = evaluate(
      9     profiles=df,
     10     features=features,

/usr/local/lib/python3.9/site-packages/cytominer_eval/evaluate.py in evaluate(profiles, features, meta_features, replicate_groups, operation, similarity_metric, percent_strong_quantile, precision_recall_k, grit_control_perts, mp_value_params)
     51         )
     52     elif operation == "grit":
---> 53         metric_result = grit(
     54             similarity_melted_df=similarity_melted_df,
     55             control_perts=grit_control_perts,

/usr/local/lib/python3.9/site-packages/cytominer_eval/operations/grit.py in grit(similarity_melted_df, control_perts, replicate_id, group_id)
     59     # Calculate grit for each perturbation
     60     grit_df = (
---> 61         similarity_melted_df.groupby(replicate_col_name)
     62         .apply(lambda x: calculate_grit(x, control_perts, column_id_info))
     63         .reset_index(drop=True)

/usr/local/lib/python3.9/site-packages/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    892         with option_context("mode.chained_assignment", None):
    893             try:
--> 894                 result = self._python_apply_general(f, self._selected_obj)
    895             except TypeError:
    896                 # gh-20949

/usr/local/lib/python3.9/site-packages/pandas/core/groupby/groupby.py in _python_apply_general(self, f, data)
    926             data after applying f
    927         """
--> 928         keys, values, mutated = self.grouper.apply(f, data, self.axis)
    929 
    930         return self._wrap_applied_output(

/usr/local/lib/python3.9/site-packages/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    236             # group might be modified
    237             group_axes = group.axes
--> 238             res = f(group)
    239             if not _is_indexed_like(res, group_axes, axis):
    240                 mutated = True

/usr/local/lib/python3.9/site-packages/cytominer_eval/operations/grit.py in <lambda>(x)
     60     grit_df = (
     61         similarity_melted_df.groupby(replicate_col_name)
---> 62         .apply(lambda x: calculate_grit(x, control_perts, column_id_info))
     63         .reset_index(drop=True)
     64     )

/usr/local/lib/python3.9/site-packages/cytominer_eval/operations/util.py in calculate_grit(replicate_group_df, control_perts, column_id_info)
     94     Usage: Designed to be called within a pandas.DataFrame().groupby().apply()
     95     """
---> 96     group_entry = get_grit_entry(replicate_group_df, column_id_info["group"]["id"])
     97     pert = get_grit_entry(replicate_group_df, column_id_info["replicate"]["id"])
     98 

/usr/local/lib/python3.9/site-packages/cytominer_eval/operations/util.py in get_grit_entry(df, col)
    135 def get_grit_entry(df: pd.DataFrame, col: str) -> str:
    136     entries = df.loc[:, col]
--> 137     assert (
    138         len(entries.unique()) == 1
    139     ), "grit is calculated for each perturbation independently"

AssertionError: grit is calculated for each perturbation independently