Is it normal that attribution takes multiple seconds per text, even on a GPU?

Really like you package, thanks a lot for the clean implementation!

I'm trying to get the attributions for each text in a large corpus (10k++ texts) on a google colab GPU. The speed I'm used to on google colab (T4 GPU) is maybe several dozen texts per second (with 16-32 batch size) during inference and during training a few batches (e.g. 32 batch size) per second. For example, when I train a deberta-xsmall model I get 'train_steps_per_second': 6.121 for 32 batch size per step.

I don't have much experience with attribution methods, but I'm surprised that the attribution seems extremely slow, also on a GPU. Based on https://github.com/cdpierse/transformers-interpret/issues/60 I have verified that the explainer runs on a gpu correctly with cls_explainer.device.

Despite being on a GPU, the code below only runs with around 2.6 seconds per iteration (one iteration is a single text truncated to 120 max tokens). This is with deberta-xsmall, so a relatively small model.

My question: is it to be expected that a T4 GPU takes 2.6 seconds per text? If not, do you see something in the code below that I'm doing wrong? (I imagine that I can increase speed by increasing internal_batch_size, but I also had surprisingly many cuda memory errors)

from transformers_interpret import SequenceClassificationExplainer
cls_explainer = SequenceClassificationExplainer(
    model,
    tokenizer)

print(cls_explainer.device)

import tqdm
word_attributions_lst = []
for row in tqdm.notebook.tqdm(df_test.iterrows()):
    # calculate word attributions per text
    word_attributions = cls_explainer(row[1]["text_prepared"], internal_batch_size=1, n_steps=30)  # defaults: n_steps=50
    # add predicted and true label to tuple
    word_attributions_w_labels = [attribution_tuple + (row[1]["label_text"],) + (cls_explainer.predicted_class_name,) for attribution_tuple in word_attributions]
    word_attributions_lst.append(word_attributions_w_labels)

cdpierse / transformers-interpret

Is it normal that attribution takes multiple seconds per text, even on a GPU? #124