bitextor / bicleaner

Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
GNU General Public License v3.0
150 stars 22 forks source link

Typo in ‘train your model’ #45

Closed djshowtime closed 4 years ago

djshowtime commented 4 years ago

https://github.com/bitextor/bicleaner/wiki/How-to-train-your-Bicleaner

$ cut -f1 bigcorpus.en-is \ | sacremoses -l en tokenize -x \ | awk '{print tolower($0)}' \ | tr ' ' '\n' \ | LC_ALL=C sort | uniq -c \ | LC_ALL=C sort -nr \ \ | grep -v "[[:space:]]*1" \ | gzip > wordfreq-en.gz

| LC_ALL=C sort -nr \ \ might be | LC_ALL=C sort -nr \

ZJaume commented 4 years ago

Fixed, thank you! :smile: