Open shahrokhDaijavad opened 3 months ago
The two new NLP modules; lang_id and doc_quality are being merged. I have already tested lang_id as a unit test (3 test files on a local Mac). Both these transforms are currently being tested regularly on a large cluster in the Pipelines testing by the inner repo team and we do not need a cluster testing strategy. For local testing (and inclusion in a new corresponding Notebook example), it would make sense to identify a small set of input files for which these transforms create meaningfully observable output. I will work with Hamid and Dhiraj in identifying such a set.
Search before asking
Component
Transforms/Other
Feature
We are adding document quality and spoken language id NLP modules and new code modules for HAP, License filtering and PII to the kit and we need testing similar (or better!) to what was done for the initial set of code modules.
Are you willing to submit a PR?