Really like you package, thanks a lot for the clean implementation!
I'm trying to get the attributions for each text in a large corpus (10k++ texts) on a google colab GPU. The speed I'm used to on google colab (T4 GPU) is maybe several dozen texts per second (with 16-32 batch size) during inference and during training a few batches (e.g. 32 batch size) per second. For example, when I train a deberta-xsmall model I get 'train_steps_per_second': 6.121 for 32 batch size per step.
I don't have much experience with attribution methods, but I'm surprised that the attribution seems extremely slow, also on a GPU. Based on https://github.com/cdpierse/transformers-interpret/issues/60 I have verified that the explainer runs on a gpu correctly with cls_explainer.device.
Despite being on a GPU, the code below only runs with around 2.6 seconds per iteration (one iteration is a single text truncated to 120 max tokens). This is with deberta-xsmall, so a relatively small model.
My question: is it to be expected that a T4 GPU takes 2.6 seconds per text?
If not, do you see something in the code below that I'm doing wrong? (I imagine that I can increase speed by increasing internal_batch_size, but I also had surprisingly many cuda memory errors)
from transformers_interpret import SequenceClassificationExplainer
cls_explainer = SequenceClassificationExplainer(
model,
tokenizer)
print(cls_explainer.device)
import tqdm
word_attributions_lst = []
for row in tqdm.notebook.tqdm(df_test.iterrows()):
# calculate word attributions per text
word_attributions = cls_explainer(row[1]["text_prepared"], internal_batch_size=1, n_steps=30) # defaults: n_steps=50
# add predicted and true label to tuple
word_attributions_w_labels = [attribution_tuple + (row[1]["label_text"],) + (cls_explainer.predicted_class_name,) for attribution_tuple in word_attributions]
word_attributions_lst.append(word_attributions_w_labels)
small update: saw in the Captum docs that they often use model.eval() and model.zero_grad() before attribution. I tried doing this, but also didn't really help
Really like you package, thanks a lot for the clean implementation!
I'm trying to get the attributions for each text in a large corpus (10k++ texts) on a google colab GPU. The speed I'm used to on google colab (T4 GPU) is maybe several dozen texts per second (with 16-32 batch size) during inference and during training a few batches (e.g. 32 batch size) per second. For example, when I train a deberta-xsmall model I get
'train_steps_per_second': 6.121
for 32 batch size per step.I don't have much experience with attribution methods, but I'm surprised that the attribution seems extremely slow, also on a GPU. Based on https://github.com/cdpierse/transformers-interpret/issues/60 I have verified that the explainer runs on a gpu correctly with
cls_explainer.device
.Despite being on a GPU, the code below only runs with around 2.6 seconds per iteration (one iteration is a single text truncated to 120 max tokens). This is with deberta-xsmall, so a relatively small model.
My question: is it to be expected that a T4 GPU takes 2.6 seconds per text? If not, do you see something in the code below that I'm doing wrong? (I imagine that I can increase speed by increasing internal_batch_size, but I also had surprisingly many cuda memory errors)