Open mg515 opened 6 years ago
input variable should be (237, 9, 4096).
features
should be (4096), take the 2nd last layer of vgg_16(a.k.a. one layer before softmax, the Dense(4096)
). LSTM should be the module that outputs probability for each class which is 5.
so the framework works as: image (224, 224, 3) -> CNN (take features from last FC layer aka one layer before softmax) -> 4096 -> LSTM -> class prediction
Ah, thanks! It is a bit ambiguous what data_dim
is supposed to be, since at first it is assigned to be r*w, and later gets changed to hardcoded 4096, which I didn't understand. I guess it's related to the CNN output. Thanks for clarification.
In this case, I think theres a little bug in record_weights
function, as the flag is st
, not sf
. :)
And because of it, the output of the CNN model is with dimension 5, not 4096, which made it output the error for me in the first place.
You may check the pull req.
I see, maybe was a typo. :P
Tested for Spatial Module. Thanks for the contribution!!!
How about your side? does it work for full training?
Code is running, but after a few subjects I get an OOM (Out of memory) error from tensorflow. Depending on the batch size as well, if I use 5 it gets me to subject=3, if I use 10 it only gets me to subject=1 before the OOM happens. I'm assuming either some memory leak directly in tf or that some object (model) does not get deleted the way it should.
My gpu is a 1070gtx with 8GB of memory, not that it matters really. Any ideas?
edit: will try to use theano instead of tf, since you are using it and have less problems it seems
I also use theano. tensorflow keras overflows a 1080TI as well. not recommended. batch size around 20-30 should be fine. if still doesn't work, try reducing the lstm's recurrent units lower than 3000.
flag change into 'st'
When I run the code I get the following error:
*** ValueError: Error when checking input: expected lstm_1_input to have shape (9, 50176) but got array with shape (9, 5)
Where 50176 is the
data_dim
variable (50176=244*244 being the photo dimensions). And the input variable to LSTM (features.shape
) equals (237, 9, 5), where 237 is the number of video samples, so basically the features variable is the probability output of CNN for each class. Is this the way it is supposed to work?The way I see it, the input layer to LSTM is supposed to be size of 5.