-
I am doing a research on multilingual generations, and I find that it seems that the tokenizer used by sacrebleu can only split the sentence into separate words by Space,like:
'今天是个好天气'->'今 天 是 个 好 天…
-
Hi,
when I use multi-bleu-detok.perl(moses-scripts/scripts/generic/multi-bleu-detok.perl)and sacrebleu to score the translation,here are some results:
multi-bleu-detok:
./tools/moses-scripts/scri…
-
In this branch: https://github.com/huggingface/safetensors/compare/julien-c/js I pushed a proof-of-concept of how, given the simplicity of the format, one can fetch metadata about the weights over sma…
-
I've initiated the alltalk_tts app with `start_alltalk.sh` after installing alltalk with `atsetup.sh` and can access alltalk's WebUI, but I'm uncertain about the `atsetup.sh` deepspeed installation wh…
-
### Describe the bug
![Screenshot 2023-02-12 153700](https://user-images.githubusercontent.com/33654834/218304085-967f9576-f751-4f80-b584-fee3c4fe2dec.png)
### To Reproduce
import os
from TTS.tt…
-
### Describe the bug
i was trying to train a bangla tts model using your library and with this dataset https://www.kaggle.com/datasets/mobassir/multi-speaker-bangla-tts and the dataset was collected …
-
## Information
Unable to save 'ufal/robeczech-base' fast tokenizer, which is a variation of roberta. I have tried the same minimal example (see below) with non-fast tokenizer and it worked fine.
…
-
### Describe the bug
Almost identical issue to issue #1695, however I never saw a recorded solution for what ended up working, is anyone familiar with this issue?
![image](https://github.com/coqui…
-
The tokenizer I am using:
`tokenizer = BertTokenizerFast.from_pretrained("sagorsarker/bangla-bert-base")` with datasets v1.0.2 & transformers v4.2.1.
Whenever I try to map the train data:
`train_da…
-
Hi,
Is it possible to extract/generate word embeddings using **BanglaBERT?**
I have **tokenized** my Bangla sentence using BanglaBERT. Now I want to generate **Word Embeddings** from my tokenized s…