korean-tokenizer Search Results

373 results
for korean-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/transformers #23009

whisper identified the wrong language

### Feature request When I follow the example of long-form transcription for whisper-large with Korean, the result is English. But after finetuning the whisper-large model with some Korean data, the …

LYPinASR updated 1 year ago
5
openai/tiktoken #111

Enhancing CJK Performance: Accelerating Training and Inferen…

CJK languages, such as Chinese, Japanese, and Korean, require more tokens due to their extensive character sets. A single character is typically split into 2-3 tokens by the tokenizer. However, the…

golbin updated 1 year ago
1
huggingface/peft #265

Fine-tuning NLLB model in multi-gpu makes RuntimeError

I tried to fine-tune NLLB model on my custom dataset on multi-gpu environment, and it makes following error. `RuntimeError: Expected all tensors to be on the same device, but found at least two dev…

comchobo updated 1 year ago
9
explosion/spaCy #12449

Korean blank model crashes if mecab-ko not installed, but no…

Initializing a Korean spacy.blank model throws an error when `natto-py` is not installed, and asks the user to install both `natto-py` and `mecab-ko`. However, if only `natto-py` (and not `mecab-ko`) …

stephantul updated 1 year ago
3
soyoung97/Standard_Korean_GEC #3

Data Format

Hi, I have a question regarding training and test data. Actually i have seen both M2 format and parallel file format for GEC tasks. Can you please guide me that which format is used in which situati…

saramoeini20 updated 1 year ago
6
h2oai/h2ogpt #703

Does not support multilingual output

![image](https://github.com/h2oai/h2ogpt/assets/74184102/f09ad7e1-fe6d-44fe-9603-575f525a526c) Hello！Is there any improvement plan?

babytdream updated 1 year ago
46
typesense/typesense #1002

Typesense crashes when locale of the field is ja (Japanese) …

## Description Typesense crashes if client tries to import Japanese in `locale: "ja"` field. ## Steps to reproduce 1. Create a collection that contains `locale: "ja"` field. I used the code u…

despenser08 updated 1 year ago
30
SciSharp/LLamaSharp #12

cyrillic doesn't work

I have model which generating text using cyrillic alphabet. It's work in llama-cpp-python but in LLamaSharp I heve unknown symbols: ![image](https://github.com/SciSharp/LLamaSharp/assets/50872233/cd4…

00jeser updated 9 months ago
21
qdrant/qdrant-client #247

scroll query chinese mismatch

For support query match chinese，the flow setup i do. 1. I have building qdrant from source with tags, i have config Dockerfile with `ARG FEATURES=multiling-chinese,multiling-japanese,multiling-kore…

sforce100 updated 1 year ago
1
hyunwoongko/kss #56

KeyError: 'EMOJI'

## 개요 #51 이슈가 kss 3.7.3 버전에서도 이모지를 포함한 문서들에서 발생하는 것을 확인하고 리포트합니다. 모든 이모지에 대해서 에러가 발생하지는 않는 것 같고 첨부한 문서 (b.txt)와 같은 특정 조건에서 발생하는 것 같습니다. ## 재현 방법 1. 첨부한 b.txt를 다운로드 2. 아래 코드를 실행 ```python im…

singleheart updated 1 year ago
3

上一页 1...15 16 17 18 19 20 21...38 下一页

373 results for korean-tokenizer

373 results
for korean-tokenizer