Closed FahdCodes closed 2 years ago
Hi, thanks for raising this as an issue. At the moment, pymusas is not supporting tagging English text. We have not yet released the English lexicons that would be required as the knowledge source, see https://github.com/UCREL/Multilingual-USAS but English is part of our planned roadmap, see here: https://github.com/UCREL/pymusas/blob/main/ROADMAP.md
The current languages supported are described here, each with example code: https://ucrel.github.io/pymusas/usage/how_to/tag_text
For now, I will leave this issue open since we are planning to release an English version later as described in the roadmap.
I am closing this now that we have released the English lexicons (https://github.com/UCREL/Multilingual-USAS/tree/master/English) and provided example code pipeline and documentation for English (https://ucrel.github.io/pymusas/usage/how_to/tag_text#english)
I'm facing trouble tagging English text. I'm using spaCy's 'en_core_web_sm' dataset for the pipeline. Apparently, the English dataset does not have 'token.pos', something that the 'usas_tagger' requires. The documentation says that the tagger should work even without 'token.pos', however when I go ahead and feed the english text to the tagger, it simply tags 'Z99' to all the words.
Would really appreciate any valuable inputs.
Below is the full code. Thanks!