i kept having the same cuda error when runing this part when computing the similarity on llama2-7b.
for batch in tqdm(dataloader, desc="Processing batches"):
inputs = tokenizer(batch, return_tensors="pt", padding="longest", max_length=max_length, truncation=True).to(device)
with torch.no_grad():
outputs = model(**inputs)
attention_mask = inputs["attention_mask"]
hidden_states = outputs.hidden_states
last_non_padded_hidden_states = utils.get_last_non_padded_tokens(hidden_states, attention_mask)
# Remove the first element to account for the input layer not being considered a model hidden layer
# This adjustment is necessary for analyses focusing on the model's internal transformations
last_non_padded_hidden_states = last_non_padded_hidden_states[1:]
# Ensure that the length of last_non_padded_hidden_states matches the number of model hidden layers minus one
assert len(last_non_padded_hidden_states) == model.config.num_hidden_layers, "Length of last_non_padded_hidden_states \
does not match expected number of hidden layers."
# Compute distances and append to all_distances
distances = utils.compute_block_distances(last_non_padded_hidden_states, layers_to_skip)
for i, distance in enumerate(distances):
all_distances[i].append(distance)
i tested on torch1.13+cu116 and torch2.10+cu122. but both of these setting ran into cuda error.
torch2.10+cu122:
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
torch1.13+cu116:
...
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [107,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...
i kept having the same cuda error when runing this part when computing the similarity on llama2-7b.
i tested on torch1.13+cu116 and torch2.10+cu122. but both of these setting ran into cuda error. torch2.10+cu122:
torch1.13+cu116: