-
I've inspired by this mail-list thread.
As many Japanese already know, default built-in dictionary bundled with Kuromoji (MeCab IPADIC) is a bit old and no longer maintained for many years. While i…
-
-
Error When Index Setting "Synonym Filter" with "Korean (nori) Analysis"
**Elasticsearch version** (`bin/elasticsearch --version`): 6.5.3
**Plugins installed**: [ analysis-nori ]
**JVM v…
-
- [ ] [I finally got perfect labels (classification task) via prompting : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1amvfua/i_finally_got_perfect_labels_classification_task/)
# TIT…
-
[엘라스틱서치 실무 가이드](https://www.yes24.com/Product/Goods/71893929) 책을 읽고 정리
## 챕터 별 바로가기
[1장 - 검색 시스템 이해하기](https://github.com/KimDoubleB/LAB/issues/2#issuecomment-1735292654)
[2장 - 엘라스틱서치 살펴보기](htt…
-
Hello,
I'm currently working on text processing that involves filtering (like gopher) in various languages. But now, the default word_tokenization in datatrove filters is based on English, as shown…
-
- [ ] [tabby/README.md at main · TabbyML/tabby](https://github.com/TabbyML/tabby/blob/main/README.md?plain=1)
# tabby/README.md at main · TabbyML/tabby
# 🐾 Tabby
[![latest release](https://shield…
-
This model's maximum context length is 4097 tokens. However, your messages resulted in 4112 tokens (3992 in the messages, 120 in the functions). Please reduce the length of the messages or functions.
…
-
**Elasticsearch version** (`bin/elasticsearch --version`): 5.3.2
**Plugins installed**: [analysis-hebrew, analysis-icu, analysis-smartcn, analysis-stconvert, analysis-stempel, analysis-ukrainian, e…
-
Before asking for help, thank you for sharing your decent work with the public.
Your idea of optimization of tokenizer for Korean has deeply inspired me.
So I really want to try your model in tand…