dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

[BUGFIX] [SCRIPT] Fixed a bug in clean_tok_mono_corpus.py and merged clean_tok_para_corpus.py and clean_tok_mono_corpus.py #1520 #1490 #1579

Closed akshatgui closed 2 years ago

akshatgui commented 2 years ago

Description

In clean_tok_mono_corpus.py was taking two variables in line 238 but as it was a mono_corpus only one was needed. Another issue of merging two files(clean_tok_para_corpus.py and clean_tok_mono_corpus.py) with similar functions (different types of corpus) was also completed.

Checklist

Essentials

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 2 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1579/1df42c561ae9552960e3f8b5f22e74de812a29c6/index.html

szha commented 2 years ago

Awesome. Thanks for refactoring! Overall it looks reasonable to me. I approved the CI workflow to run

akshatgui commented 2 years ago

Awesome. Thanks for refactoring! Overall it looks reasonable to me. I approved the CI workflow to run

Thank you sir, I guess you can merge this request and also close these two issues as it has passed all the checks needed .