training error - Githubissues

amiltonwong commented 8 years ago

Hi, @elliottd

I came across the following error when issuing training command: Could you suggest me how to fix it? THX~

root@milton-OptiPlex-9010:/data/GroundedTranslation# THEANO_FLAGS=floatX=float32,device=gpu0 python train.py --dataset iaprtc12_eng --hidden_size=256 --fixed_seed --run_string=fixed_seed-eng256mlm Using Theano backend. Using gpu device 0: GeForce GTX TITAN (CNMeM is disabled, CuDNN 4007) INFO:main:Run arguments: INFO:main:clipnorm: -1 INFO:main:small_val: False INFO:main:fixed_seed: True INFO:main:dataset: iaprtc12_eng INFO:main:generate_from_N_words: 0 INFO:main:patience: 10 INFO:main:enable_val_pplx: True INFO:main:use_predicted_tokens: False INFO:main:no_image: False INFO:main:max_epochs: 50 INFO:main:predefined_epochs: False INFO:main:gru: False INFO:main:generation_timesteps: 30 INFO:main:source_vectors: None INFO:main:debug: False INFO:main:supertrain_datasets: None INFO:main:run_string: fixed_seed-eng256mlm INFO:main:optimiser: adam INFO:main:lr: None INFO:main:beta2: None INFO:main:beta1: None INFO:main:hidden_size: 256 INFO:main:source_enc: None INFO:main:epsilon: None INFO:main:batch_size: 100 INFO:main:source_type: None INFO:main:stopping_loss: bleu INFO:main:l2reg: 1e-08 INFO:main:dropin: 0.5 INFO:main:init_from_checkpoint: None INFO:main:big_batch_size: 10000 INFO:main:h5_writeable: False INFO:main:small: False INFO:main:unk: 3 INFO:main:num_sents: 5 INFO:main:existing_vocab: INFO:data_generator:Initialising data generator INFO:data_generator:Train/val dataset: iaprtc12_eng INFO:data_generator:Input gold descriptions INFO:data_generator:Extracting vocabulary INFO:data_generator:Pickling dictionary to checkpoint/fixed_seed-eng256mlm/vocabulary.pk INFO:data_generator:Max seq length 63, setting max_seq_len to 65 INFO:data_generator:Split sizes {'test': 1962, 'train': 15897, 'val': 1766} INFO:data_generator:Number of words 3953 -> 1764 INFO:data_generator:Retained / Original Tokens: 272175 / 275467 (98.80 pc) INFO:data_generator:Average train sentence length: 17.33 tokens INFO:data_generator:Making data for val INFO:models:Building Keras model... INFO:models:Using image features: True INFO:models:Using source language features: False INFO:models:... visual: adding image features as input features Traceback (most recent call last): File "train.py", line 282, in model.train_model() File "train.py", line 85, in train_model use_image=self.use_image) File "/data/GroundedTranslation/models.py", line 98, in buildKerasModel model.add(Activation('time_distributed_softmax')) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 734, in init self.activation = activations.get(activation) File "/usr/local/lib/python2.7/dist-packages/keras/activations.py", line 47, in get return get_from_module(identifier, globals(), 'activation function') File "/usr/local/lib/python2.7/dist-packages/keras/utils/generic_utils.py", line 14, in get_from_module str(identifier)) Exception: Invalid activation function: time_distributed_softmax

elliottd commented 8 years ago

You are using a version of Keras > 0.1.3.

time_distributed_softmax was already in deprecation mode back then, and the recommended activation function was just softmax.

Pull the code again and it should work with a newer version of Keras.

amiltonwong commented 8 years ago

Hi, @elliottd ,

Thanks for your quick reply. After I modify the "models.py" as https://github.com/elliottd/GroundedTranslation/commit/fc108773cd6e9467a19c0b75436a09a3692f00e1 , it passes that issue. However, a new error comes out:

OSError: [Errno 12] Cannot allocate memory

The detailed log is here: https://gist.github.com/amiltonwong/6ea97505483852e1ed956b4cf92d5445

What's the minimum ram memory required? My machine has only 8GB ram memory.

THX~

elliottd commented 8 years ago

@amiltonwong the current code needs a lot more memory than that because we process all the validation data in a single batch. This was never a big problem for us on the IAPR-TC12 dataset but we're running into some memory-management issues with the new multilingual Flickr30K datasets.

We are working on reducing the memory footprint in the experimental_datagen branch but I don't think we can easily get it down to < 8GB. It might be possible with small minibatches. I'll ping you when the new code has been merged into the main code.

elliottd / GroundedTranslation

training error #18