Closed djshowtime closed 4 years ago
https://github.com/bitextor/bicleaner/wiki/How-to-train-your-Bicleaner
$ cut -f1 bigcorpus.en-is \ | sacremoses -l en tokenize -x \ | awk '{print tolower($0)}' \ | tr ' ' '\n' \ | LC_ALL=C sort | uniq -c \ | LC_ALL=C sort -nr \ \ | grep -v "[[:space:]]*1" \ | gzip > wordfreq-en.gz
| LC_ALL=C sort -nr \ \ might be | LC_ALL=C sort -nr \
| LC_ALL=C sort -nr \ \
| LC_ALL=C sort -nr \
Fixed, thank you! :smile:
https://github.com/bitextor/bicleaner/wiki/How-to-train-your-Bicleaner
| LC_ALL=C sort -nr \ \
might be| LC_ALL=C sort -nr \