Open goldsail opened 6 years ago
We can regard this problem as a dialogue system design problem. I can't think of any perfectly viable way of acquiring the data. But this wiki page provides a good starting point: https://zh.wikipedia.org/wiki/%E8%86%9C%E8%9B%A4%E6%96%87%E5%8C%96
It requires a bunch of corpus to train an autoencoder. Plus, I have little knowledge about NLP. What I know about NLP is to convert the text to a sequence of vectors using word2vec, and then use sequential models.
word2vec is the first step. Let's simply gather some data and just put them in an LSTM to get a feeling of it. GAN could be used for NLP as well.
First, to locate and label the haas sentence from the text you download from web, you should be able to train a classifier, which tells any sentence a Haas's or not. But this also requires dataset...