Open pidugusundeep opened 4 years ago
A complete set of model binaries in folder ./models/ you can create with ./bin/all.sh. Put your text corpus for the language model to the folder data/corpus_extra. All.sh invokes,
The dictionary is created from the Language corpus, a huge text corpus used for the language model has (normally) a dictionary with more words. If this leads always to a better evaluation is not guarantied.
Adding rules is quite simple and mostly separated from the sources. You can add a rule to an already existing rule file, then nothing more is to do. The rules are located in the folder ./data/rules. If you want to add a new rules file you must modify the source file ./utils/rules/rules.sl. We have 13 rules loader subs at the end of ./utils/rules/rules.sl , modify one of them loading your new rules file or write a next rules loader sub and append it to the present ones. Don't forget to invoke ./bin/buildrules.sh after your changes, what creates a new ./models/rules.bin. If you want to use POS tagging in your rules we have a POS tagger ./bin/tagit.sh which can help you to create your new rule.
On Fri, Feb 21, 2020 at 7:33 AM Sundeep notifications@github.com wrote:
- Can I train for an updated model?
- Does it improve by adding more dictionary words?
- How do I add new rules?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Automattic/atd-server-next/issues/2?email_source=notifications&email_token=ACQPES5IJXFXFSZM73L3LJLRD5YSTA5CNFSM4KY5FXG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPGUNWQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQPES7Y7X2NA6TTVTUS6SDRD5YSTANCNFSM4KY5FXGQ .