-
### Feature request
When I follow the example of long-form transcription for whisper-large with Korean, the result is English. But after finetuning the whisper-large model with some Korean data, the …
-
CJK languages, such as Chinese, Japanese, and Korean, require more tokens due to their extensive character sets. A single character is typically split into 2-3 tokens by the tokenizer.
However, the…
-
I tried to fine-tune NLLB model on my custom dataset on multi-gpu environment, and it makes following error.
`RuntimeError: Expected all tensors to be on the same device, but found at least two dev…
-
Initializing a Korean spacy.blank model throws an error when `natto-py` is not installed, and asks the user to install both `natto-py` and `mecab-ko`. However, if only `natto-py` (and not `mecab-ko`) …
-
Hi,
I have a question regarding training and test data. Actually i have seen both M2 format and parallel file format for GEC tasks.
Can you please guide me that which format is used in which situati…
-
![image](https://github.com/h2oai/h2ogpt/assets/74184102/f09ad7e1-fe6d-44fe-9603-575f525a526c)
Hello!Is there any improvement plan?
-
## Description
Typesense crashes if client tries to import Japanese in `locale: "ja"` field.
## Steps to reproduce
1. Create a collection that contains `locale: "ja"` field. I used the code u…
-
I have model which generating text using cyrillic alphabet. It's work in llama-cpp-python but in LLamaSharp I heve unknown symbols:
![image](https://github.com/SciSharp/LLamaSharp/assets/50872233/cd4…
-
For support query match chinese,the flow setup i do.
1. I have building qdrant from source with tags, i have config Dockerfile with `ARG FEATURES=multiling-chinese,multiling-japanese,multiling-kore…
-
## 개요
#51 이슈가 kss 3.7.3 버전에서도 이모지를 포함한 문서들에서 발생하는 것을 확인하고 리포트합니다. 모든 이모지에 대해서 에러가 발생하지는 않는 것 같고 첨부한 문서 (b.txt)와 같은 특정 조건에서 발생하는 것 같습니다.
## 재현 방법
1. 첨부한 b.txt를 다운로드
2. 아래 코드를 실행
```python
im…