subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/tokenizers #1170

does `tokenizer.train_from_iterator` reads all texts into me…

hi to the community! recently i'm training a BPE tokenizer with an existing large corpus(reading them all into memory is not feasible). the corpus was not common one-text-per-line file (for ex…

Maxlinn updated 1 year ago
4
pytorch/pytorch #75470

Segmentation fault in zero_grad()

### 🐛 Describe the bug An attempt to run this in Colab or docker, etc. on a GPU fails due a segmentation fault (see trace below) https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_with_Sub…

bmwshop updated 1 year ago
5
Helsinki-NLP/OPUS-MT-train #76

What is tatoeba-langtune?

Is this recipe used to tune the tatoeba models that were already trained? I am hoping to provide data to it to tune multilingual tatoeba models but I am not sure where this recipe is pulling data from…

hdeval1 updated 2 years ago
2
google/sentencepiece #535

How to use BPE-Dropout?

How can I use BPE-Dropout? I don't see any changes if I try out different alpha values for BPE model.

samin9796 updated 2 years ago
3
NVIDIA/NeMo #3761

On punctuation and capitalization

Here's a few feature requests + bugs related to the punctuation and capitalization model. ### Punctuation issues #### Inverted punctuation For languages like Spanish, we need two predictions per …

1-800-BAD-CODE updated 2 years ago
2
huggingface/transformers #12775

Adding m2m100 12B

# 🌟 New model addition Hi! I was wondering if there's been any work on adding the 12B version of m2m100 model to huggingface. Given libraries such as fairscale or parallelformers, inference wit…

Mehrad0711 updated 2 years ago
6
apache/lucene #2562

multilingual analyzer based on icu [LUCENE-1488]

The standard analyzer in lucene is not exactly unicode-friendly with regards to breaking text into words, especially with respect to non-alphabetic scripts. This is because it is unaware of unicode b…

asfimport updated 2 years ago
35
amir-zeldes/HebPipe #14

[Help needed] Sentence segmentation does not segment

Background: I am trying to build an automated pipeline to segment sentence from the result of Google Speech-to-Text service. Issue: The `-s` parameter does not work as expected. See details below. An…

callzhang updated 2 years ago
5
prajdabre/yanmtt #8

CUDA error while pre-training BART & how to use --hard_trunc…

Hi again, After getting the NAN loss error from the previews issue, I launched another training during the weekend: ``` python3 pretrain_nmt.py -n 1 -nr 0 -g 2 --model_path models/bart_base_512 \…

GorkaUrbizu updated 2 years ago
2
Ldoun/DeepSinger #5

ipa subword segmentation using BPE

https://github.com/kh-kim/subword-nmt using this repo due to too many unique value in tokenizer vocab

Ldoun updated 2 years ago
1

上一页 1...4 5 6 7 8 9 10...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation