OlegBaskov / language-learning

OpenCog unsupervised Language Learning project
https://wiki.opencog.org/w/Language_learning
MIT License
0 stars 1 forks source link

Study training set cleanup influence on grammar induction quality #49

Open OlegBaskov opened 5 years ago

OlegBaskov commented 5 years ago

Delayed after F1~0.97 finding in "no-filter" settings (min_word_count = min_link_count = min_co-occurrence_count = 1, max... = 100000 > real words, features, disjuncts in Child Directed Speech "br-test" corpus).

OlegBaskov commented 5 years ago

2018-02-11 Slack language-learning: "yah so a first question would be, what is the F measure (as you've been computing it) just for sentences length 10 or less? And -- for kicks -- what about, just for sentences length 5 or less?"
-- 2 tests started 2018-02-18 (running on 84 server), +2 started 2018-02-20 (88 server)

OlegBaskov commented 5 years ago

Mean shift clustering Gutenberg Children Books test with min_word_count = [31,21,11,6,2] started 2018-02-20.

OlegBaskov commented 5 years ago

"Try to evaluate GC grammar quality learned on full GC corpus but texting on corpora with sentence lenghts 5, 10, 15, 25 - as suggested by Ben?" -- ULLproject plan tasks