chetan51 / linguist

A language-learning AI based on NuPIC
33 stars 13 forks source link

getting the best params - swarming #1

Closed breznak closed 10 years ago

breznak commented 10 years ago

Hi @chetan51, thanks for this project! I'm experimenting with this for text generation. Learning a larger (not so much really) dataset -ie data/child-stories.txt looks unsuccessful. Could you please add the files you used for profiling using swarming? I'd like to rerun those. Also, the pamLength looks small to me (def. 2).. Did you experiment further on bigger data? Any results/speedups (appart from reducing the number for n-head printed chars, that helps quite a lot)

Also, can you, @scottpurdy or sb explain this? [681136] m ==> ade a (1.00 | 0.53 | 0.53 | 0.53 | 0.53) [681137] a ==> tches (0.54 | 0.54 | 0.55 | 0.57 | 0.53) [681138] t ==> hees (0.33 | 0.44 | 0.45 | 0.33 | 0.33) [681139] c ==> hes m (0.94 | 0.93 | 0.91 | 0.94 | 0.88) [681140] h ==> es ma (0.79 | 0.41 | 0.42 | 0.27 | 0.33) [681141] e ==> ah (0.30 | 0.32 | 0.30 | 0.22 | 0.19) [681142] s ==> made (0.63 | 0.55 | 0.55 | 0.55 | 0.87) [681143] . ==> She (0.74 | 0.71 | 0.86 | 0.86 | 0.74)

1st line: uncertain, assumes most probable "made"...ok 2nd: M-A-(de) confirms assumed "made" from #1, why does it change to "matches"?? (same happens to "matches" on line 3, where 't' confirms prediction, but we diverge from what has been predicted at #2)

Thanks a lot,

breznak commented 10 years ago

...I expected predicting n-ahead steps happens as : predict the most probable, feed to CLA, repeat

chetan51 commented 10 years ago

Hey,

Glad to see someone playing with the project!

See /tools/swarm for the files I used to run swarms on the various datasets. The last swarm I ran (which set the pamLength that you referred to) was run on the childrens-stories.txt dataset.

The strange behavior that you observed, I was also wondering why that happens. I have some ideas but I don't know for sure yet, so I'm planning on profiling this with the new Cerebro tool to see if I can gain any insight.

scottpurdy commented 10 years ago

The temporal pooler will have a set of cells predicted at each step (multiple simultaneous predictions). The classifier converts the predicted cells back to letters. So when it sees "m" it may be predicting the TP cells for both "a" in "made" and "a" in "matches". The classifier is guessing that the "m" is the start of "made" but when the "a" comes the TP doesn't necessarily lock on to just the "made" sequence. So in the next step the classifier is still guessing whether you are in the "made" sequence or the "matches" sequence.

I am sort of spitballing here but it seems like the behavior seen, while not intuitive, could be correct, at least for some of the letters.

The spatial pooler and the CLA classifier make it a little hard to reason about the results. Perhaps an alternative would be to use just the temporal pooler. You could have 40 or so columns for each character that you want to include. I would limit the characters you include (convert everything to lowercase, for instance). If you have 30 characters with 40 columns per character than you need a TP with 1200 columns. Assign the first 40 columns to "a", the next 40 to "b", etc. And you can directly map the predicted cells/columns back into predicted letters (and the more predicted columns for a given letter, the more likely you can say that letter will come next).

The downside is that you can only predict one step ahead. So not sure if you want to move to this but it would make it easier to reason about the results. You can see examples of using the TP directly here: https://github.com/numenta/nupic/tree/master/examples/tp

Hope that helps a little.