korean-tokenizer Search Results

371 results
for korean-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

BookStackApp/BookStack #778

Chinese search cannot find words in the middle of a sentence…

### For Bug Reports * BookStack Version: v0.20.0 When the word I'm looking for is the first word, or there's a space in front of it, it's ok. ![i01](https://user-images.githubusercontent.com/30…

jasoncheng7115 updated 1 month ago
22
MaartenGr/BERTopic #1502

model.visualize_hierarchical_documents now shows whole topic…

![image](https://github.com/MaartenGr/BERTopic/assets/41793074/d0e64ec0-8d86-4f7c-acb6-b4e2e057c473) ![image](https://github.com/MaartenGr/BERTopic/assets/41793074/1440a9f9-9cfc-448a-82a0-871ae680530…

smbslt3 updated 11 months ago
15
idio/wiki2vec #13

Support request: tokenization

It would be great to add the org.apache.lucene.analysis for smarter tokenization for all languages. In this way, processing other languages such as Chinese is more sensible with your library.

nick-magnini updated 6 years ago
19
YutaroOgawa/pytorch_advanced #199

[UNK] id mismatching in "8-4_bert_IMDb.ipynb"

``` from utils.bert import BertTokenizer, load_vocab vocab_bert, ids_to_tokens_bert = load_vocab(vocab_file="./vocab/bert-base-uncased-vocab.txt") TEXT.build_vocab(train_ds, min_freq=1) TEXT.v…

hccho2 updated 2 years ago
5
ggerganov/llama.cpp #8490

Bug: gemma2 perplexity pending forever

### What happened? I am attempting to measure the perplexity of the gemma-2-9b-it-Q4_K_M.gguf model using llama.cpp. However, I encounter an issue where the process gets stuck at the "tokenizing th…

StatPan updated 2 weeks ago
1
elastic/elasticsearch #46365

Korean tokenizer (Nori) doesn't split digits and letters

## Describe the feature **Elasticsearch version** (`bin/elasticsearch --version`): 6.7.2 **Plugins installed**: [] - nori **JVM version** (`java -version`): jvm 1.8 **OS versi…

drakejin updated 2 months ago
7
apache/lucene #9860

Decouple Kuromoji's morphological analyser and its dictionar…

I've inspired by this mail-list thread. As many Japanese already know, default built-in dictionary bundled with Kuromoji (MeCab IPADIC) is a bit old and no longer maintained for many years. While i…

asfimport updated 2 years ago
33
irthomasthomas/undecidability #662

StarCoder2 and The Stack v2 from BigCode

- [ ] [blog/starcoder2.md at main · huggingface/blog](https://github.com/huggingface/blog/blob/main/starcoder2.md?plain=1) # blog/starcoder2.md at main · huggingface/blog --- ## StarCoder…

irthomasthomas updated 6 months ago
1
tarasglek/chatcraft.org #610

Add cohere free model to free endpoint, set it to default

They offer free models for non-prod usage. this is a 104B, way better than other free models ```bash curl --request POST \ --url https://api.cohere.ai/v1/chat \ --header 'accept: application…

tarasglek updated 4 months ago
2
huggingface/speech-to-speech #33

Latest macOS MPS support fails after fresh git clone

Hi @andimarafioti I just cloned the repo (0c53fda7dd5682155186a03386434a1c1cf50212) and did a `pip install`. Then I ran `python s2s_pipeline.py --local_mac_optimal_settings` and got this er…

ChristianWeyer updated 3 weeks ago
17

上一页 1...6 7 8 9 10 11 12...38 下一页

371 results for korean-tokenizer

371 results
for korean-tokenizer