pandas.eval() for massive datasets?

Current method with apply in pandas:

def apply(self, df: pd.DataFrame) -> pd.DataFrame:
    # Existing checks
    df['static_check_'] = (
        df[self.duct_static_col] < df[self.duct_static_setpoint_col] - self.duct_static_inches_err_thres)
    df['fan_check_'] = (
        df[self.supply_vfd_speed_col] >= self.vfd_speed_percent_max - self.vfd_speed_percent_err_thres)

    # Combined condition check
    df["combined_check"] = df['static_check_'] & df['fan_check_']

    # Rolling sum to count consecutive trues
    rolling_sum = df["combined_check"].rolling(window=5).sum()
    # Set flag to 1 if rolling sum equals the window size (5)
    df["fc1_flag"] = (rolling_sum == 5).astype(int)

    return df

Use eval?

def apply_with_eval(self, df: pd.DataFrame) -> pd.DataFrame:
    # Use eval for simple comparison operations
    df.eval('static_check_ = @self.duct_static_col < (@self.duct_static_setpoint_col - @self.duct_static_inches_err_thres)', inplace=True)
    df.eval('fan_check_ = @self.supply_vfd_speed_col >= (@self.vfd_speed_percent_max - @self.vfd_speed_percent_err_thres)', inplace=True)

    # Combined condition check (bitwise AND)
    df["combined_check"] = df['static_check_'] & df['fan_check_']

    # Rolling sum to count consecutive trues (This part remains the same)
    rolling_sum = df["combined_check"].rolling(window=5).sum()
    # Set flag to 1 if rolling sum equals the window size (5)
    df["fc1_flag"] = (rolling_sum == 5).astype(int)

    return df

Any insights appreciated its sort of interesting to see what ChatGPT states in a conversation about this.

Continue using your current approach with standard pandas operations, especially for the more complex parts like the rolling window operation.

If performance becomes an issue, consider using eval() for the simpler comparison operations, but benchmark to ensure it's actually faster for your specific case.

Always balance between readability/maintainability and performance, choosing the one that best fits your project's requirements.

Remember, while eval() can offer performance improvements in certain cases, it's always good to benchmark with your specific dataset to ensure it's actually faster and doesn't compromise readability or maintainability.

bbartling / open-fdd

pandas.eval() for massive datasets? #20