-
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo, John Richardson
Accepted as a demo paper at EMNLP2018
https://arxiv.org/…
-
I first extract contexts from `test.refs.txt` (6000 lines)
```bash
cat test.refs.txt | cut -f 1 > test.source
```
and extract multi ref files (use up to 15 per sample)
```bash
for (( i=2; i refs…
-
Hi, and thank you for the update!
I've been trying to finish the alignment steps on LDC2017T10, but have run into a bug:
```
(ve) ~/guo_lu/AMR-Parser-master/data$ ./align.sh
3237
3238
3239
…
-
currently only english type quotes (both up) are supported, an option like -Sq x (x = numeric id of quote type) would be nice to allow e.g. (this is for Czech) „abc“; or even -SqAB, where A(B) represe…
jgm updated
10 months ago
-
Currently, Wenet support fst-based LM decoding, which is a copy of kaldi and relies on external open-fst package.
I try to implement ctc with n-gram decoding in wenet, which contains following step…
-
Hello
When i was running your code based on this -> https://github.com/s-ankur/hindi_grammar_correction/blob/main/Colab%20Notebooks/https_github.com_s-ankur_fairseq-gec.ipynb , I had an error said …
-
Hi,
Tried to execute the following command
`sudo docker build -f Dockerfile . -t build-pt`
and got the following output
```Step 1/15 : FROM ubuntu:16.04 …
-
https://github.com/bene-ges/nemo_compatible/blob/194af660d9b6d3d578884048d40b524775fd10e8/scripts/nlp/en_spellmapper/dataset_preparation/prepare_corpora_after_alignment.py#L167
When running `get_ng…
-
I am trying [BioGPT](https://github.com/microsoft/BioGPT)/[examples](https://github.com/microsoft/BioGPT/tree/main/examples)/[QA-PubMedQA](https://github.com/microsoft/BioGPT/tree/main/examples/QA-Pub…
-
The [original Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl) supports the `--protected` flag. It's effect is to accept a file with a list of r…