What are the different ways to improve the results?

A complete set of model binaries in folder ./models/ you can create with ./bin/all.sh. Put your text corpus for the language model to the folder data/corpus_extra. All.sh invokes,
- ./bin/buildmodel.sh which creates the uni-, bi- and trigrams serialised to ./models/model.bin, and a dictionary from the corpus saved to ./models/dictionary.txt. The n-grams are also used for a Bayes statistics which we can trigger by filter options like stats.
- ./bin/buildrules.sh which creates the ATD- FSM rules engine serialised to ./models/rules.bin
- ./bin/testgr.sh , the evaluation for the grammar
- ./bin/trainspellcontext.sh, ./bin/trainspellnocontext.sh the training of the spellchecker neuron networks
- ./bin/trainhomophones.sh , training of the homophones neuron network.
The dictionary is created from the Language corpus, a huge text corpus used for the language model has (normally) a dictionary with more words. If this leads always to a better evaluation is not guarantied.
Adding rules is quite simple and mostly separated from the sources. You can add a rule to an already existing rule file, then nothing more is to do. The rules are located in the folder ./data/rules. If you want to add a new rules file you must modify the source file ./utils/rules/rules.sl. We have 13 rules loader subs at the end of ./utils/rules/rules.sl , modify one of them loading your new rules file or write a next rules loader sub and append it to the present ones. Don't forget to invoke ./bin/buildrules.sh after your changes, what creates a new ./models/rules.bin. If you want to use POS tagging in your rules we have a POS tagger ./bin/tagit.sh which can help you to create your new rule.

On Fri, Feb 21, 2020 at 7:33 AM Sundeep notifications@github.com wrote:

Can I train for an updated model?

Does it improve by adding more dictionary words?

How do I add new rules?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Automattic/atd-server-next/issues/2?email_source=notifications&email_token=ACQPES5IJXFXFSZM73L3LJLRD5YSTA5CNFSM4KY5FXG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPGUNWQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQPES7Y7X2NA6TTVTUS6SDRD5YSTANCNFSM4KY5FXGQ .

Automattic / atd-server-next

What are the different ways to improve the results? #2