-
It's becoming the norm to have prompt prefixes for text embedding models. I think we should add this to the [hf-embedder](https://docs.vespa.ai/en/reference/embedding-reference.html#huggingface-embedd…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I want to use the following:
LLM: Llama 2 7B chat
Embed Model: sentence-transformers/a…
-
`word_tokenize` keeps the opening single quotes and doesn't pad it with space, this is to make sure that the clitics get tokenized as `'ll`, `'ve', etc.
The original treebank tokenizer has the sam…
-
Among open issues, we have (not an exhaustive list):
- #135 complains about the sentence tokenizer
- #1210, #948 complain about word tokenizer behavior
- #78 asks for the tokenizer to provide offsets …
-
I’m getting error while trying hindi language File "C:\Users\contact\Desktop\xtts-webui-main\venv\lib\site-packages\TTS\tts\models\xtts.py", line 526, in inference
text = split_sentence(text, la…
-
### Bug Description
https://github.com/run-llama/llama_index/blob/162f5a0523f5a4de33f8cc056ec2b24713d2ee9e/llama-index-integrations/embeddings/llama-index-embeddings-huggingface-optimum/llama_index/e…
-
Hi All,
I am trying to get some very basic tokenization to work. I think I am not using the API properly because the method `Tokenize` is throwing System.NullReferenceException. Any suggestions?
…
-
Hello again
I'd like to turn all words of a sentence into singular.
For example `my dog has lots of flees` should become `[ 'my', 'dog', 'has', 'lots', 'of', 'flee' ]`
Here the code:
``` js
va…
-
Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.
https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Lang…
-
```
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('clip-ViT-L-14')
my_tok = model.tokenizer
```
results in
`AttributeError: 'SentenceTransformer' …