floybix / comportex-archived

Private fork of Comportex
1 stars 0 forks source link

Phoneme Sequence experiments #4

Open mrcslws opened 8 years ago

mrcslws commented 8 years ago

Our brains are amazing at converting sequences of syllables into words. Finding the gaps between them.

Experiment: First, convert bodies of text into phonemes, possibly using this dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict . Observe an HTM's ability to put names on sequences of phonemes, to recognize them.

This experiment could come in two stages:

  1. Supervised learning
  2. Unsupervised learning

In the supervised experiment, we train the HTM by forcing encodings in a higher region. Give it a sequence of phonemes at the bottom, and a word encoding at the top. Then, to test it, give it a long sequence of phonemes and let it convert it to words.

In the unsupervised experiment, the HTM endlessly inhales phonemes, and decides which sequences to give names to. I haven't decided what this testing process would look like.

I'd start with the supervised experiment. In some sense, the HTM "commits" an interpretation of a word when it rules out the possibility that a subsequent phoneme is a continuation of the word. From afar, this seems doable, but perfecting this process is a main goal of this experiment.

The end result: it can reconstruct the text of a Wikipedia article from its phonemes. A big win here is that we don't need to hand-craft the input data or manually evaluate the HTM's results. If the output word sequence is equal to the input word sequence, it worked. The internet is our corpus.

I haven't decided whether phoneme transitions/predictions come solely from lateral connections, or also from higher layer feedback. Our current thinking is a little unclear on which context lives in higher layers and which lives in the which-cell-in-the-column choice. I'm hoping in this project I'd get some more clarity on that.

mrcslws commented 8 years ago

Also this gives us another opportunity to test replaying of sequences.

As we teach an HTM the pronunciation dictionary, we can test whether it has forgotten older words by asking it to replay them.

robjfr commented 8 years ago

Some quick comments on this.

We could look at phonemes. But I don't think it is the low hanging fruit at the moment. That's because:

1) Deep Learning is already doing quite well recognizing phonemes. By my analysis that's because phonemes are a largely static problem. Every language has 20-50. They drift, but only slowly at the edges.

What this means is the static learning methods of Deep Learning already handle phonemes well enough to be useful.

We will handle phonemes better, we will model the slow drift and indeterminacy at the edges. But the advantage over Deep Learning state-of-the-art will not be so clear.

Our great advantage will be that we are going to solve the infinite structure of new meaning, an the extreme case of the unsupervised learning problem, not only unsupervised, but unsupervisable/unbounded/creative. That's why I think the phrase structure problem is the best one to tackle first.

2) The way I think we will eventually handle phonemes better is by going directly to the continuous sound signal and segmenting it according to context contrasts much like those we are looking at to segment phrases now.

So I think the key will be meaningful contrasts (which is how linguists tackled them historically.) Maybe we will do this by successively dividing the continuous stream of language sound in halves, the way I'm proposing to find phrase structure. That would be a way of dealing with the continuous character of sound.

There may even be a top down element. We may need meaning contrasts modelled in some way higher up before we can find the context contrasts we need to segment the continuous sound signal.

So two things: 1) It is already done so well no-one will care. The low fruit is problems like unsupervised and unsupervisable new structure. And 2) We may need top down meaning structure to do it properly.

mrcslws commented 8 years ago

So the fruit is low-hanging, it's just not very sweet.

I still don't think I've grokked your mental model, so I might be wrong. But I think we're going to need some testable waypoints on our journey to a coherent pooling algorithm. This could be one of them. We run the risk of banging our head against walls and then realizing that we should have started with something easier to work out the kinks in our logic and code. Or worse, not realizing it. In terms of ground-breaking discoveries, this probably won't be anything, aside from maybe a component in some future thing.

And as far as I know, this will be the first implementation of supervised learning on HTM. That seems worthwhile, no matter what the toy example.

robjfr commented 8 years ago

Try it if you like Marcus.

But supervised learning is a snake pit. It will lead you away from the fundamental solution. Supervised learning verily embodies the mistaken assumptions which prevent people finding the solutions which will finally make this cognitive structure stuff work. It seems easier, but actually it is poisoned fruit. People get trapped mentally with it. Follow that path and you'll end up outside Eden.

Take your first assertion:

"Our brains are amazing at converting sequences of syllables into words. Finding the gaps between them."

It seems right. Grab that assumption and carry on and look where you find yourself. I did a quick google with keywords "disagreement segmentation Chinese" and found this first up:

https://s3.amazonaws.com/tm-town-nlp-resources/ch2.pdf

"Sproat et. al. (1996) give empirical results showing that native speakers of Chinese frequently agree on the correct segmentation in less than 70% of the cases. Such ambiguity in the definition of what constitutes a word makes it difficult to evaluate segmentation algorithms which follow different conventions, since it is nearly impossible to construct a “gold standard” against which to directly compare results."

Chinese is a good example because they have no conventionalized segmentation to make you think static structure is natural. But do the experiment and the agreement is not there.

People keep trying to find a gold standard and train to it because they can't imagine how it could be any other way. They've become so used to the supervised learning problem, or more fundamentally the static structure problem, that they keep doing it and hope the details will come out in the wash.

But it is like my map of the world analogy. The map does not fit on a flat surface. You can ignore the ragged edges and keep working with flat maps, because they seem "easier" first up. But the real simplicity is to forget the flat map "intermediate stage" and go straight to the actual shape of the problem.

Try supervised learning of phonemes, or words, and you'll get the same fuzzy 90% match to fuzzy 90% gold standards, that everyone else gets, which leaves you cussing at the silly speech recognition app on your smart phone and spawns umpteen movie jokes about useless voice dialers.

Better to figure out what I'm trying to convey, and get a new, dynamic, perspective on the structure problem.