korean-tokenizer Search Results

elastic/elasticsearch #80521

Korean (nori) tokenizer punctuation

**Elasticsearch version**: `7.13.3` (tested on `7.15.1` too) **Plugins installed**: [`repository-s3`, `analysis-nori`] **JVM version** (`java -version`): `Eclipse Adoptium/OpenJDK 64-Bit Server…

AyWa updated 6 days ago

X-PLUG/mPLUG-DocOwl #94

how can I replace tokenizer TinyChart

I want to change the tokenizer so that it can be applied to Korean I would appreciate it if you could change LLM_PATH and additionally let me know which part of the code should be modified.

LilDevsy0117 updated 1 week ago

atomicdata-dev/atomic-server #882

Chinese / Japanese / Korean tokenizer support for search / t…

Out of the box, Tantivy only support latin languages. We could add some extra tokenizers: Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/ca…

joepio updated 2 months ago

huggingface/parler-tts #67

sharing results of korean(+english) bilingual training

Hello, Based on your code, I added Korean tokens (using a Korean emotional dataset) to the tokenizer and fine-tuned the model with the LibriTTS R dataset. The Korean dataset is slightly less than 3…

choihk6610 updated 1 month ago

mcognetta/ThreeHotKoreanModeling #1

My two cents

Copolot suggested this repository while adding additional tokens (James) to my tokenizer. Here's my two cents: I'm afraid to say that this is basically character-level encoding or the same as on…

christallire updated 2 months ago

quickwit-oss/tantivy-py #25

Currently, the tokenizer is hard-coded to default, it would be better to include some configurable tokenizer for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segm…

ghost updated 5 months ago

kamalkraj/e5-mistral-7b-instruct #6

Am I using the code incorrectly? help me

I used this code and trained with Korean ko-snil data. adapter_config.json, adapter_model.safetensors, special_tokens_map.json, tokenizer_config.json, tokenizer.json, tokenizer.model 5 files wer…

Pang-dachu updated 2 months ago

neonbjb/tortoise-tts #738

Is additional training possible with tortoise-tts?

I am very interested in this project. I think it's an interesting project that can create tts with a 10-second voice sample. I also think it's good to support multiple languages. However, there is a p…

kbuwel updated 2 months ago

apache/lucene #6020

the korean analyzer that has a korean morphological analyzer…

Korean language has specific characteristic. When developing search service with lucene & solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with…

asfimport updated 1 year ago

194

Franck-Dernoncourt/NeuroNER #30

Steps to utilize NeuroNER for other languages

It appears that BART at least is pretty language agnostic. The English specific parts of NeuroNER (afaict), are the recommended `glove.6B.100d` word vectors, and all of the spacy related tokenizing co…

sooheon updated 4 years ago

358 results
for korean-tokenizer