Giskard Scan crush when tested for a large number of features

Issue Type

Bug

Source

source

Giskard Library Version

2.14.0

Giskard Hub Version

2.14.0

OS Platform and Distribution

Linux Ubuntu 20.04

Python version

3.9

Installed python packages

numpy==1.23.5
pandas==2.0.3
pyarrow==16.0.0
openpyxl==3.1.2
scikit-learn==1.3.1
xgboost==1.7.6
featurewiz==0.3.2

Current Behaviour?

I run the scan with a testing dataset of 100 samples and ~3700 features and an OOM error occured. 
I have utilized a pipeline with Data transformers, featurewiz feature selection and XGBoost model. 
I have run the library in 2 other use cases with <100 number of features and it runs smoothly without any issue therefore the issue i am suspecting is related with the vast amount of features

Running on 48GB Ram.

Standalone code OR list down the steps to reproduce the issue

import pandas as pd
import numpy as np
from giskard import Dataset, Model, scan

# Class to create the model and Dataset
class VulnerabilityDetection:
    def __init__(self, df: pd.DataFrame, model_instance):
        self.model_instance = model_instance
        self.df = df

    def gisk_dataset(self):
        CATEGORICAL_COLUMNS = list(self.df[self.df.columns[self.df.dtypes == 'object']].columns)
        giskard_dataset = Dataset(
                df=self.df,
                target="Target",
                name="",
                cat_columns=CATEGORICAL_COLUMNS,
                )
        return giskard_dataset

    def gisk_model(self):
        model_inst = self.model_instance

        def prediction_function(df: pd.DataFrame) -> np.ndarray:
            return model_inst.predict_proba(df)

        giskard_model = Model(
            model=prediction_function,
            model_type="classification",
            name="Vulnerability Detection Model",
            classification_labels=model_inst.classes_,
            feature_names=self.df.columns
        )
        return giskard_model

# Execution
import pickle
df = pd.read_csv("MyData")
with open("XGBoost_pipeline.pkl", 'rb') as file:
    xg_pipeline = pickle.load(file)
vd = VulnerabilityDetection(df, xg_pipeline)
gisk_dataset = vd.gisk_dataset()
gisk_model = vd.gisk_model()

Relevant log output

Actually the Notebook from VSCode crushed with OOM error

Giskard-AI / giskard

Giskard Scan crush when tested for a large number of features #1974

Issue Type

Source

Giskard Library Version

Giskard Hub Version

OS Platform and Distribution

Python version

Installed python packages

Current Behaviour?

Standalone code OR list down the steps to reproduce the issue

Relevant log output