subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

rsennrich/subword-nmt #77

Problem with a large corpus

I have a large corpus, around 40GB of text. I install subword-nmt via pip and try to make the dictionary with subword-nmt command line and it takes forever to finish. I just wonder whether there any s…

nguyenvulebinh updated 5 years ago
2
rsennrich/subword-nmt #71

too many @ in the result

I trained two vocabularies with about 900M Chinese-English materials, and then coded two data sets (900M training set and 500K test set) with these two Chinese-English vocabularies. The training se…

kFoodie updated 5 years ago
7
srvk/how2-dataset #3

Query related to pre-processing

Hi, It is mentioned in the paper that a SentencePiece vocab of size 5K was created for both, English and Portuguese. So was something like `max_length` was set for the sentences or did you use all …

Demfier updated 5 years ago
2
justhalf/bpe_analysis #5

Discussion for the final report

Here are some feedbacks we got in class yesterday. 1. Chinese and Japanese don't use whitespace, but their characters are logogram. (The unique number of characters is large.) What happens if we tr…

notani updated 5 years ago
18
rsennrich/subword-nmt #72

How to identify the subunits in an encoded text

I understand that by removing the `@@ ` symbols I get back to the input text, but how can I identify the smallest subunits in the processed text? If for example I have `di@@ rect`, How can I figure…

CodingJonas updated 5 years ago
4
goodatlas/zeroth #8

Downloaded sets different than on published RESULTS

NOTE: I'm referring to the RESULTS file on the current Kaldi commit, not goodatleas/zeroth Hi, I tried running the provided recipes for zeroth_korean on kaldi. I didn't change anything on the scri…

feddybear updated 5 years ago
3
src-d/reading-club #51

Next paper candidates: 3 May

# Next paper candidates Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote.

m09 updated 5 years ago
6
facebookresearch/fairseq #568

Does fairseq-interactive and fairseq-genetate generate diffe…

I used fairseq-interactive and fairseq-generate respectively to decode the same file, but the result is slighly different. The result generated from fairseq-generate outperformed the result from fairs…

gxzks updated 5 years ago
9
facebookresearch/fastText #224

What preprocessing steps were applied to the Wikipedias to t…

I would like to use fastText for languages that don't have clear word boundaries, such as Chinese, Japanese, Thai or Vietnamese. I have found various softwares to partition text from these languages …

ageron updated 5 years ago
12
src-d/reading-club #48

Next paper candidates: 19 April

# Next paper candidates Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote. ## Last session runner-up [Graph Attention Networ…

bzz updated 5 years ago
5

上一页 1...9 10 11 12 13 14 15...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation