Automattic / atd-server-next

After the Deadline Server
GNU General Public License v2.0
21 stars 11 forks source link

What are the different ways to improve the results? #2

Open pidugusundeep opened 4 years ago

pidugusundeep commented 4 years ago
  1. Can I train for an updated model?
  2. Does it improve by adding more dictionary words?
  3. How do I add new rules?
michaelharzheim commented 4 years ago
  1. A complete set of model binaries in folder ./models/ you can create with ./bin/all.sh. Put your text corpus for the language model to the folder data/corpus_extra. All.sh invokes,

    • ./bin/buildmodel.sh which creates the uni-, bi- and trigrams serialised to ./models/model.bin, and a dictionary from the corpus saved to ./models/dictionary.txt. The n-grams are also used for a Bayes statistics which we can trigger by filter options like stats.
    • ./bin/buildrules.sh which creates the ATD- FSM rules engine serialised to ./models/rules.bin
    • ./bin/testgr.sh , the evaluation for the grammar
    • ./bin/trainspellcontext.sh, ./bin/trainspellnocontext.sh the training of the spellchecker neuron networks
    • ./bin/trainhomophones.sh , training of the homophones neuron network.
  2. The dictionary is created from the Language corpus, a huge text corpus used for the language model has (normally) a dictionary with more words. If this leads always to a better evaluation is not guarantied.

  3. Adding rules is quite simple and mostly separated from the sources. You can add a rule to an already existing rule file, then nothing more is to do. The rules are located in the folder ./data/rules. If you want to add a new rules file you must modify the source file ./utils/rules/rules.sl. We have 13 rules loader subs at the end of ./utils/rules/rules.sl , modify one of them loading your new rules file or write a next rules loader sub and append it to the present ones. Don't forget to invoke ./bin/buildrules.sh after your changes, what creates a new ./models/rules.bin. If you want to use POS tagging in your rules we have a POS tagger ./bin/tagit.sh which can help you to create your new rule.

On Fri, Feb 21, 2020 at 7:33 AM Sundeep notifications@github.com wrote:

  1. Can I train for an updated model?
  2. Does it improve by adding more dictionary words?
  3. How do I add new rules?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Automattic/atd-server-next/issues/2?email_source=notifications&email_token=ACQPES5IJXFXFSZM73L3LJLRD5YSTA5CNFSM4KY5FXG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPGUNWQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQPES7Y7X2NA6TTVTUS6SDRD5YSTANCNFSM4KY5FXGQ .