IndicoDataSolutions / Passage

A little library for text analysis with RNNs.
MIT License
530 stars 134 forks source link

Go directly into RNN layers without using Embeddings and preprocessing methods. #33

Closed sjhddh closed 9 years ago

sjhddh commented 9 years ago

Hello, We are currently working on an NLP research project. We have already obtained our way of preprocessing, such as give a long document a specified vector representer. So, what the training set now is a fixed length decimal vector, and the output should be different labels. However, as far as I know, Passage has its own preprocessing methods, and is kind of "required" for a training input. Could I have any way to reduce Passage's own preprocessing procedures? such as tokenizer and Embedding layer.

For example the layers looks like this

layers = [
            Embedding(size=256, n_features=tokenizer.n_features),
            GatedRecurrent(size=256, seq_output=True),
            GatedRecurrent(size=256, seq_output=False), # activation='t_rectify',
            Dense(size=1, activation='sigmoid')
        ]

So, is there any way to remove the first layer, so that the training inputs can directly go to RNN layers.

layers = [
            # Embedding(size=256, n_features=tokenizer.n_features),
            GatedRecurrent(size=256, seq_output=True),
            GatedRecurrent(size=256, seq_output=False), # activation='t_rectify',
            Dense(size=1, activation='sigmoid')
        ]

Well, if I do so, it will, of course, give error msg:

line 41, in __init__
    self.params = flatten([l.params for l in layers])
AttributeError: 'GatedRecurrent' object has no attribute 'params'

Thank you.

Newmu commented 9 years ago

This can be done with the generic layer. For an example, check the mnist demo https://github.com/IndicoDataSolutions/Passage/blob/master/examples/mnist.py