ccdv-ai / convert_checkpoint_to_lsg

Efficient Attention for Long Sequence Processing
MIT License
82 stars 11 forks source link

problem with xlm_roberta #9

Open puppetm4st3r opened 7 months ago

puppetm4st3r commented 7 months ago

Hi, i'm trying to convert this model:

from lsg_converter import LSGConverter
converter = LSGConverter(max_sequence_length=4096)
model, tokenizer = converter.convert_from_pretrained(model_name_or_path="T-Systems-onsite/cross-en-es-roberta-sentence-transformer")
print(type(model))

and seems to be converted ok:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of LSGXLMRobertaModel were not initialized from the model checkpoint at T-Systems-onsite/cross-en-es-roberta-sentence-transformer and are newly initialized: ['embeddings.global_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
<class 'lsg_converter.xlm_roberta.modeling_lsg_xlm_roberta.LSGXLMRobertaModel'>

but when i use the model with a long text (is and embedding model), i get:

{
    "name": "RuntimeError",
    "message": "The expanded size of the tensor (1193) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [2, 1193].  Tensor sizes: [1, 514]",
    "stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/dario/src/lsg_embeddings.ipynb Cell 3 line 3
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=28'>29</a> # Compute token embeddings
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=29'>30</a> with torch.no_grad():
---> <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=30'>31</a>     model_output = model(**encoded_input)
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=32'>33</a> # Perform pooling. In this case, max pooling.
     <a href='vscode-notebook-cell:/home/dario/src/lsg_embeddings.ipynb#W2sZmlsZQ%3D%3D?line=33'>34</a> sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/.local/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py:801, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    799 if hasattr(self.embeddings, \"token_type_ids\"):
    800     buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
--> 801     buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
    802     token_type_ids = buffered_token_type_ids_expanded
    803 else:

RuntimeError: The expanded size of the tensor (1193) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [2, 1193].  Tensor sizes: [1, 514]"

With bert models works like a sharm

ccdv-ai commented 7 months ago

Hi @puppetm4st3r Should be fixed with the last release pip install lsg-converter --upgrade

puppetm4st3r commented 7 months ago

Thanks! will try it!