sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

tensorflow/tensor2tensor #567

there may have some problem about tokenizer.py

I meet a problem when I use tensor2tensor train a translate model, and decode some sentence. The Error is ' IndexError: string index out of range' so I debug the error sentence, and find it genera…

drockser updated 6 years ago
2
nltk/nltk #3253

Corpora used to train Punkt Segmenter in German

Hi, Thank you very much for your amazing work. I used NLTK to segment a german text. I see that this language is available and the sentence tokenizer gives quite good result with the default traini…

AugustinErnoult updated 5 months ago
1
meta-llama/llama #1088

The response from meta-llama/Llama-2-7b-chat-hf ends with in…

I loaded meta-llama/Llama-2-7b-chat-hf into GPU, and tried to get response to a question. Here is the key part of the code: ``` def load_model(model_name, bnb_config): n_gpus = torch.cuda.de…

YanjingRen updated 3 months ago
2
MaartenGr/KeyBERT #183

KeyLLM keyword extraction issue

KeyLLM seems to be extracting keywords which are not even present in the document used. I am following the steps mentioned in this article - https://towardsdatascience.com/introducing-keyllm-keyword-e…

ksachdeva11 updated 11 months ago
6
diasks2/pragmatic_tokenizer #19

Should all TLDs be whitelisted?

Here is the current list: http://data.iana.org/TLD/tlds-alpha-by-domain.txt This will allow us to successfully pass the following spec: ``` ruby it 'knows what is not a domain 1' do skip "NOT IMPL…

diasks2 updated 8 years ago
1
UKPLab/sentence-transformers #824

Semantic search on finetuned LM

Hey! First of all, thank you for the awesome work you are doing. Would be grateful if you can help me out with the following situation: I have an unlabelled dataset which is domain specific and I w…

nithya-AK updated 3 years ago
5
edponce/FACET #1

Apply optimizations all across the board.

There are places for improving runtime performance: * Use local variables as proxy for class variables * Tokenizers split vs regex vs merged sentencize/tokenizer * Compile regexes * Async processi…

edponce updated 4 years ago
2
nltk/nltk #1740

Errors in the Norwegian tokenizer

Hi! While working on my master in language technology I discovered some errors in the Norwegian tokenizers: nltk.word_tokenize(“Hello NLTK”, “norwegian”) nltk.sent_tokenize(“My name. Is bob.”, …

Bsmil3y updated 7 years ago
4
huggingface/datatrove #270

Filter very slow

I using 4xH100, 100 CPU cores, 1000 RAM to filter 1TB data japanese. Although the GPU is at 50% utilization and the CPU is running at 100%, only 3MB of data is processed per minute. I suspect that the…

hiennm15 updated 2 weeks ago
6
OpenPecha/Requests #193

[RFC0053] Normalising the particle issues in TMs #185

## Work Planning Details ## Table of Contents - [Housekeeping](#housekeeping) - [Named Concepts](#named-concepts) - [Summary](#summary) - [Reference-Level Explanation](#reference-level…

tenzin3 updated 1 year ago
2

上一页 1...29 30 31 32 33 34 35...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer