sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/tokenizers #1663

Inconsistent behaviour of `PreTrainedTokenizerFast`s on diac…

### System Info - `transformers` version: 4.45.2 - Platform: Linux-5.4.0-193-generic-x86_64-with-glibc2.31 - Python version: 3.12.7 - Huggingface_hub version: 0.25.2 - Safetensors version: 0.4.5 …

sven-nm updated 2 days ago
1
opensearch-project/neural-search #794

[RFC] Model-based Tokenizer for Text Chunking

Since OpenSearch 2.13, [**fixed token length algorithm**](https://opensearch.org/docs/latest/ingest-pipelines/processors/text-chunking/#fixed-token-length-algorithm) is available in text chunking proc…

yuye-aws updated 2 weeks ago
9
huggingface/transformers #33946

DataCollatorWithFlattening is incompatible with non - list i…

### System Info latest transformers ### Who can help? @ArthurZucker ### Information - [ ] The official example scripts - [x] My own modified scripts ### Tasks - [ ] An officiall…

alex-hh updated 1 week ago
2
mideind/Tokenizer #52

A dot added in dates

If I send in "17 júní" the tokenizer returns 17. júní". Even though I use tokenized() (and not split_itsentences()) and use the txt-property (which should contain the original source text for the toke…

starkadur updated 6 days ago
6
stanford-crfm/BioMedLM #15

sentence embedding

Hi. First of all, thank you for making such a model available to us. I am trying to get vector embeddings of abstracts of some of the articles in PubMed. But somehow I couldn't get the sentence embe…

orhansonmeztr updated 1 month ago
2
huggingface/tokenizers #1644

How to build a custom tokenizer on top of a exsiting Llama 3…

Hi, I was trying to create a custom tokenizer for a different language which is not included in llama 3.2 tokenizer. I could not find exactly what tokenizer I can use from hf which is exact altern…

yakhyo updated 1 week ago
7
pytorch/torchtune #1859

using Torchtune to teach LLMs a new language

I am trying to full fine tune Llama3.2-1b to "teach" it another language (via continous pretraining). he idea is to have a model, which, given a prompt in a language , it continues the sentences in…

almugabo updated 1 week ago
3
miso-belica/sumy #216

punkt doesn't work as of nltk 3.8.2

punkt is loaded in as a pickle file which is not secure CVE-2024-39705 so you have to use punkt_tab now. This breaks `_get_sentence_tokenizer`. In order to use the Tokeniser class I had to overrid…

tomteecezint updated 2 months ago
1
PyThaiNLP/pythainlp #928

Question only: not sure which parameters to use for this lib…

Hello, thank you for the library. I've written a free program for learning languages called Lute (https://github.com/LuteOrg/lute-v3), and it would be nice to add Thai support. This library looks …

jzohrab updated 1 week ago
2
unslothai/unsloth #980

outputs.hidden_states is NoneType

```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/mistral-7b-bnb-4bit", max_seq_length=2048, load_in_4bit=Tru…

bryannali updated 1 month ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer