-
I have been using [bert.cpp](https://github.com/skeskinen/bert.cpp) for some time and I must admit the cosine similarity results are quite good. How difficult would it be to integrate the code into ll…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues and did not find a match.
### Who can help?
_No response_
### What are you working on?
I’m using a spark nlp ner…
-
非常感谢达摩院读光组的工作,GeoLayoutLM是个很棒的模型,但是这个模型使用的bert base的tokenizer,请问有预训练对中文支持的模型吗,或者未来会发布吗?
@wdp-007 @alibaba-oss @congyao
非常感谢!
-
### Link to the documentation pages (if available)
_No response_
### How could the documentation be improved?
Hi,
if anybody can help with loading xlm-roberta-tokenizer offline and apply it on d…
-
Hi;
I have a problem with encoding with XLM-RoBERTa sentencepiece tokenizer. Why is the hugging face encoding 1 greater compared to the google sentencepiece encoding?
Example
```
## Hugging …
-
### Model/Pipeline/Scheduler description
Kandinsky 2.1 inherits best practicies from Dall-E 2 and Latent diffusion, while introducing some new ideas.
As text and image encoder it uses CLIP model a…
-
all models tokenizers follow the same pattern except for XlmRoberta. and this implementation may cause problems I think.
I don't know if it was implemented like that on purpose, but if not, I would l…
-
In https://github.com/keras-team/keras-nlp/pull/653 we added a masked language modeling task for RoBERTa. We can make a similar change for the `XLMRoberta` model.
* [ ] Update `XLMRobertaTokenizer`…
-
## ❓ Questions & Help
Hi,
How could I extend the vocabulary of the pre-trained models, e.g. by adding new tokens to the lookup table?
Any examples demonstrating this?
-
@abheesht17 @mattdangerw I think this needs to be fixed