dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

Classification script does not bin ngrams correctly #589

Open leezu opened 5 years ago

leezu commented 5 years ago

As discussed by @szhengac https://github.com/dmlc/gluon-nlp/pull/529#discussion_r255815817, the classification script does not follow the paper. No word-ngram hashing is used.

szha commented 5 years ago

Another place where the script can improve is that the ngram index should be maintained by vocab too. Currently this is done through a separate dictionary: https://github.com/dmlc/gluon-nlp/commit/f48e12cb42de447768daa6f8feec1c9a06995e62#diff-4cb70aead1f1e807229a07fa8bb17a6eR107