Use of short term memory in LSTMs

alexwaeseperlman commented 8 years ago

I'm very new to the concept of neural networks so correct me if I'm getting something wrong but don't LSTMs have short term memory so you can call inputs in a sequence and get different results depending on what came before?

If so how do you train it for that, for example how would I make a program that takes a sequence of characters with a variable length and predicts what should come next.

cazala commented 7 years ago

Hi, I did an example of this a couple years ago, it's not live anymore, but there repo is out there. It needs php to run, sadly. Basically what that did was to crawl wikipedia, and start feeding the network one character at a time. The input of the training was always a character, and the output was the next character. The number of possible characters was fixed, so the sizes of the input/output layers was also fixed, with one neuron per possible char. The network would train like this for a couple hundred chars, and then it would switch to start feeding the output of the network itself as the next input, so basically the network would start to "write" something, and print it on the screen. The first outputs of the networks would look somehting like 8db8fff8bff8ddb8bfbff868d8bfbddbf8db8d, but after a couple ours it would output stuff like no es at asaesetsi aneteteise titis at. It was a small network so it was expected it wasn't going to learn english, but you can notice that it learned to use spaces to break words, and a bit of the relationship between vowels and consonants.

I hope this helps a little, it's very old code and I didn't write it thinking someone else would see it later so I'm sorry if it's not the prettiest code 😅

alexwaeseperlman commented 7 years ago

Thanks :D

shahrin014 commented 7 years ago

Hi @cazala

If you try your link http://caza.la/synaptic/#/wikipedia, a run demo button will appear, and if you click that, it will open http://caza.la/wiki for which you will see a file not found error.

Also I am trying to implement something very similar with words instead of letters but its not working out for me. Is it not enough training? (I increased the iterations and it was killing my browser)

<html>
  <script src="../bower_components/synaptic/dist/synaptic.js"></script>
  <body>
    <p>
      This application test the sequence recognition of an LSTM
    </p>
    <p id="trainingData"></p>
    <input type="text" id="inputText"></input>
    <button onclick="run()">Ask Question</button>
    <p id="result">
    </p>
  </body>
  <script>
  var trainingData = [
    { question : "Hello dog.", answer : "Woof woof." },
    { question : "Hello cat.", answer : "Meow." },
    { question : "Hello.",     answer : "Hey man." },
    { question : "Oh no.",     answer : "Whats wrong?" },
  ]
  document.getElementById("trainingData").innerHTML = JSON.stringify(trainingData)

  var tokenizeData = (data,tag) => {
    var string = data.replace(/\./g,' . ').replace(/\,/g,' , ').replace(/\?/g,' ? ')
    string = '<'+tag+'> '+string.toLowerCase()+' </'+tag+'>'
    return string.split(' ').filter((word)=> { return word })
  }
  var tokenizedTrainingData = trainingData.map(data=> {
    var request  = tokenizeData(data.question,'q')
    var response = tokenizeData(data.answer,'a')
    return request.concat(response)
  })

  var vocabulary = tokenizedTrainingData.reduce((vocab, conversationTokens)=> {
    conversationTokens.map((word)=> {
      if(vocab.indexOf(word)< 0)
        vocab.push(word)
    })
    return vocab
  },[])
  var encodeToVocabulary = (word) => {
    return vocabulary.map(vocab=> { return word == vocab ? 1 : 0 })
  }
  console.log(tokenizedTrainingData)

  console.log(vocabulary)
  //Vocabulary will contain this
  //["<q>", "hello", "dog", ".", "</q>", "<a>", "woof", "</a>", "cat", "meow", "hey", "man", "oh", "no", "whats", "wrong", "?"]
  var vocabularyLength = vocabulary.length
  var LSTM = new synaptic.Architect.LSTM(vocabularyLength,vocabularyLength,vocabularyLength);

  var trainingSet = tokenizedTrainingData.reduce((set,conversationTokens)=> {
    conversationTokens.map((word, index)=> {
      if (word == '</a>') return //Its the end of the sentence. Theres nothing after this.
      var input   = encodeToVocabulary(conversationTokens[index])
      var output  = encodeToVocabulary(conversationTokens[index+1])
      set.push({ input: input, output: output })
    })
    return set
  },[])
  console.log('trainingSet',trainingSet)
  LSTM.trainer.train(trainingSet,{
    rate: .1,
    iterations: 10,
    error: .005,
    shuffle: false,
    log: 1000,
    cost: synaptic.Trainer.cost.CROSS_ENTROPY
  });
  function run() {
    var input = []
    var prediction = []
    var question = document.getElementById("inputText").value
    tokenizeData(question,'q').map((word)=>{
      input = encodeToVocabulary(word)
      prediction = LSTM.activate(input);
    })

    var word = "<a>"
    var response = [word]
    while(word!='</a>' || response.length > 20) {
      var input = encodeToVocabulary(word)
      prediction = LSTM.activate(input);
      word = vocabulary[prediction.indexOf(Math.max(...prediction))]
      response.push(word)
    }
    console.log(response)
    document.getElementById("result").innerHTML = response.join(' ')
  }
  </script>
</html>

cazala / synaptic

Use of short term memory in LSTMs #170