QData / TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
https://textattack.readthedocs.io/en/master/
MIT License
2.98k stars 397 forks source link

Textattack for cuml models not utilising much GPU resources #790

Open farwashah6 opened 6 months ago

farwashah6 commented 6 months ago

Hi. I am new to using GPU. I have used the Textattack library earlier for one of my projects using Sklearn and Keras models. For that I created the customModelWrappers according to my models and they worked fine. Now since my data is different and very big, I want to implement it using GPU for the same (sklearn) models.

I have the understanding that sklearn models do not implement on GPU and I have to use CUML instead. But when I use CUML, and pass the cuml model to the CustomModelWrapper I created earlier, it gives me the following error len() of unsized object and then stops the execution.

Additional Info: For vectorisation of my data I am using CountVectorizer of cuml, which is the cause of this error. Instead when I use CountVectorizer of sklearn it does the attack but doesn't use much GPU resources (of course). Please help me in this.

I am attaching my modelWrapper here.

class CustomModelWrapper(ta.models.wrappers.ModelWrapper):

    def __init__(self, model, vectorizer):
        super().__init__()
        self.model = model
        self.vectorizer = vectorizer

    def __call__(self, text_input_list, batch=None):
        x_transform = self.vectorizer.transform(pd.Series(text_input_list)).astype(float)
        prediction = self.model.predict_proba(x_transform)
        return prediction
RealPolitiX commented 3 months ago

@farwashah6, I'm curious how big your model is and how much GPU resource it should occupy. I've encountered perhaps a similar issue with HuggingFaceModelWrapper, but found a temporary solution by commenting out a few lines in the textattack code. See #798

farwashah6 commented 2 months ago

@farwashah6, I'm curious how big your model is and how much GPU resource it should occupy. I've encountered perhaps a similar issue with HuggingFaceModelWrapper, but found a temporary solution by commenting out a few lines in the textattack code. See #798

Thanks for the suggestion. The models I am trying to use are SVM, Random Forest and Linear Regression. So not big at all. But the problem is that they are scikit learn models and they are not supported by GPU and when I use CUML library, I encounter this problem.

beckernick commented 2 months ago

I work on RAPIDS at NVIDIA and came across this issue due to the cuML reference.

I've seen this error come up in scenarios where a downstream function expects a data structure that support the len operator but instead gets a scalar or ndarray equivalent (i.e., np.ndarray(1) or cupy.ndarray(1)).

If you're running into scenarios in which a cuML model's output or behavior isn't consistent with scikit-learn (particularly if you're passing in CPU-based inputs), could you file an issue on cuML?

If you can provide a minimal, reproducible example of your error we may be able to help triage it.

farwashah6 commented 2 months ago

Thank you for the suggestion and yes I have also posted the issue on cuML and attached a sample code.

I work on RAPIDS at NVIDIA and came across this issue due to the cuML reference.

I've seen this error come up in scenarios where a downstream function expects a data structure that support the len operator but instead gets a scalar or ndarray equivalent (i.e., np.ndarray(1) or cupy.ndarray(1)).

If you're running into scenarios in which a cuML model's output or behavior isn't consistent with scikit-learn (particularly if you're passing in CPU-based inputs), could you file an issue on cuML?

If you can provide a minimal, reproducible example of your error we may be able to help triage it.