sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ollama/ollama #3747

Support `XLMRobertaModel` architecture

Hi all from Ollama! First off: Great work with Ollama, keep up the good work! What i am missing though is models in different languages (dutch for me personally). Is it possible to add multiling…

wouterverduin updated 1 week ago
20
stanfordnlp/stanza #1131

Mismatched token output using custom stanza tokenizer

**Describe the bug** I trained a custom stanza tokenizer and mwt on UD_English-GUM. When using the tokenizer & mwt for inference, the tokenizer changed the surface form of the word. For example, the …

yilunzhu updated 5 months ago
3
run-llama/llama_index #14805

[Question]: Sentence Splitter with hyperlinks

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I am implementing a sentence splitter for texts. If the text contains hyperlinks, it beh…

truski22 updated 6 days ago
4
chaoyi-wu/RadFM #32

Can't find factory for 'abbreviation_detector' for language …

Hello, while I try to run the train.py in src, i got this error : /root/miniconda3/envs/radfm/lib/python3.9/site-packages/spacy/language.py:2195: FutureWarning: Possible set union at position 6328…

qiansehu updated 5 months ago
1
shap/shap #3660

When plotting the shap text it is showing an extra letter(Ġ…

### Issue Description From May 14th, it is showing an extra letter(Ġ) before every word in a sentence when using shap.plots.text(shap_values) **Code snippet:** _pred = transformers.pipeline( …

shafikrony updated 5 months ago
1
UKPLab/sentence-transformers #1329

Quality of word embeddings using Sentence Transformer models

Hi I was trying out the sentence transformer (all_minilm) model. The sentence embeddings are of great quality. I wanted to use token embeddings too. but the token embeddings do not contain that much …

divyag11 updated 7 months ago
5
flairNLP/flair #3357

[Question]: Subtoken Labeling?

### Question I am working with a Named Entity Recognition (NER) dataset in offset format, where each label is defined by its start_index, end_index, and entity_type. My code converts each label fr…

quantarb updated 12 months ago
3
meta-llama/llama3 #114

How do models do batch inferring when using the transformer …

I am a noob. Here is my code, how can I modify it to do batch inferring? --- def load_model(): model_id = 'llama3/Meta-Llama-3-70B-Instruct' pipeline = transformers.pipeline( "t…

code-isnot-cold updated 4 weeks ago
14
hideaki-t/sqlite-fts-python #25

FTS5_TOKEN_COLOCATED

The FTS tokenizer API has the concept of "colocated" tokens where multiple tokens can occupy the same position in a sentence. The main use of this functionality is to implement synonyms (See [Sec 7.1.…

andersjo updated 8 months ago
1
wangyuxinwhy/uniem #117

微调后模型保存和load的问题

### 🐛 bug 说明 1. 微调后finetuned-model/model下没有保存pytorch_model.bin文件，仅有： config.json model.safetensors special_tokens_map.json tokenizer.json tokenizer_config.json vocab.txt 这是否正常呢 2. 使用微调后的模型生成向…

LeiShenVictoria updated 9 months ago
3

上一页 1...23 24 25 26 27 28 29...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer