-
Hi all from Ollama!
First off: Great work with Ollama, keep up the good work!
What i am missing though is models in different languages (dutch for me personally). Is it possible to add multiling…
-
**Describe the bug**
I trained a custom stanza tokenizer and mwt on UD_English-GUM. When using the tokenizer & mwt for inference, the tokenizer changed the surface form of the word. For example, the …
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I am implementing a sentence splitter for texts. If the text contains hyperlinks, it beh…
-
Hello, while I try to run the train.py in src, i got this error :
/root/miniconda3/envs/radfm/lib/python3.9/site-packages/spacy/language.py:2195: FutureWarning: Possible set union at position 6328…
-
### Issue Description
From May 14th, it is showing an extra letter(Ġ) before every word in a sentence when using shap.plots.text(shap_values)
**Code snippet:**
_pred = transformers.pipeline(
…
-
Hi
I was trying out the sentence transformer (all_minilm) model. The sentence embeddings are of great quality. I wanted to use token embeddings too. but the token embeddings do not contain that much …
-
### Question
I am working with a Named Entity Recognition (NER) dataset in offset format, where each label is defined by its start_index, end_index, and entity_type. My code converts each label fr…
-
I am a noob. Here is my code, how can I modify it to do batch inferring?
---
def load_model():
model_id = 'llama3/Meta-Llama-3-70B-Instruct'
pipeline = transformers.pipeline(
"t…
-
The FTS tokenizer API has the concept of "colocated" tokens where multiple tokens can occupy the same position in a sentence. The main use of this functionality is to implement synonyms (See [Sec 7.1.…
-
### 🐛 bug 说明
1. 微调后finetuned-model/model下没有保存pytorch_model.bin文件,仅有:
config.json
model.safetensors
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt
这是否正常呢
2. 使用微调后的模型生成向…