Open sogm1 opened 6 months ago
Hello!
Are you using a custom training loop or something? If you added extra layers, then the default training probably will not work anymore: it expects the model to output e.g. "sentence_embedding" while your classification model probably outputs classes instead. So, if you're using a custom training loop, then you can indeed tokenize your text and pass them to the forward
or __call__
methods of the model (they're identical in torch
).
Something along the lines of:
for batch in dataloader:
# maybe tokenize the batch if that's not done already
# maybe move the batch to the right device if that's not done already
# 'batch' is a dictionary of "input_ids" and "attention_mask" keys
output = model(batch)
loss = loss_fn(output)
loss.backward()
# maybe optimizer step()
# maybe scheduler step()
So, model.encode(...)
is the interface for most users, and model(...)
or model.forward(...)
is how the model is actually accessed.
Also, this is a bit unrelated, but I've had pretty good luck with just training a Sentence Transformer model without any extra layers, and then training a LogisticRegression on top of it, by using roughly:
# assume that we have `texts` and `labels`
X = model.encode(texts)
classifier = LogisticRegression().fit(X, labels)
That could also be worth a shot.
@tomaarsen
Hello, thank you for reply.
i have one more question, my fine-tuned-model could be trained by below training loop?
My custom customclassification Model this :
import torch
import torch
import torch.nn as nn
num_classes =select_top10_df["points_category"].nunique()
# setting
num_classes =select_top10_df["points_category"].nunique()
# model
class CustomBertModel(nn.Module):
def __init__(self, model, num_classes):
super(CustomBertModel, self).__init__()
self.encoder = model
# finetunning config
self.dropout = nn.Dropout(0.5)
self.dense1 = nn.Linear(model[1].word_embedding_dimension, 768)
self.tanh = nn.Tanh()
self.dense2 = nn.Linear(768, num_classes)
# Freeze BERT parameters
for param in self.encoder.parameters():
param.requires_grad = True
#Add classification layer
def forward(self, batch):
output = self.encoder.forward(batch)["sentence_embedding"]
output = self.dropout(torch.tensor(output))
logits = self.dense1(output)
logits = self.tanh(logits)
logits = self.dropout(logits)
logits = self.dense2(logits)
return logits
model3 = CustomBertModel( fine_tuned_model, num_classes) # fine_tuned_model : The Model trained by Sentence-Transformer
model3
The result :
CustomBertModel(
(encoder): SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
(dropout): Dropout(p=0.5, inplace=False)
(dense1): Linear(in_features=768, out_features=768, bias=True)
(tanh): Tanh()
(dense2): Linear(in_features=768, out_features=5, bias=True)
)
And then my training loop :
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import datetime
# Config
num_epochs = 15
optimizer = optim.Adam(model.parameters(), lr=1e-5)
criterion = nn.CrossEntropyLoss()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
for epoch in range(num_epochs):
model3.train() # Training mode
total_loss = 0.0
total_correct = 0
for batch in train_dataloader:
inputs = {
"input_ids": batch['input_ids'].to(device),
"attention_mask": batch['attention_mask'].to(device)
}
labels = batch['labels'].to(device)
optimizer.zero_grad()
model3.to(device)
outputs = model3(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
_, predicted = torch.max(outputs, 1)
total_correct += (predicted == labels).sum().item()
avg_loss = total_loss / len(train_dataloader)
avg_acc = total_correct / len(train_dataset)
print(f"Epoch [{epoch + 1}/{num_epochs}] - Loss: {avg_loss:.4f}, Accuracy: {avg_acc * 100:.2f}%")
if avg_loss < best_train_loss:
best_train_loss = avg_loss
counter = 0
# Save the best model
torch.save(model.state_dict(), 'my_directory')
else:
counter += 1
# Check if early stopping criteria are met
if counter >= patience:
print(f"Early stopping after {epoch + 1} epochs without improvement.")
break
but i got a error message 😥 :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-69-168d05b7ae23> in <cell line: 16>()
30 outputs = model3(inputs)
31 loss = criterion(outputs, labels)
---> 32 loss.backward()
33 optimizer.step()
34
1 frames
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
First, thank you so much for sentence-transformer.
How to get embedding vector when input is tokenized already?
i guess sentence-transformer can
.encode(original text)
.But i want to know there is way like
.encode(token_ids )
or.encode(token_ids, attention_masks)
This is my background below
regards