dabasajay / Image-Caption-Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.
MIT License
292 stars 82 forks source link

error on training #14

Open urmikakasi opened 2 years ago

urmikakasi commented 2 years ago

This is the output from training- the model is not getting saved due to a callback issue.

4:6:34: Using Inceptionv3 model {}: Generating image features using inceptionv3 model... 2022-01-14 04:06:34.692740: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5 96116736/96112376 [==============================] - 1s 0us/step 96124928/96112376 [==============================] - 1s 0us/step 100% 8091/8091 [28:33<00:00, 4.72it/s] 4:35:12: Completed & Saved features for 8091 images successfully 4:35:12: Parsing captions file... 4:35:12: Parsed captions: 40460 4:35:12: Parsed & Saved successfully 4:35:12: Available images for training: 6000 4:35:12: Available captions for training: 30000 4:35:13: Available images for validation: 1000 4:35:13: Available captions for validation: 5000 RNN Model (Decoder) Summary : Model: "model_1"


Layer (type) Output Shape Param # Connected to

input_3 (InputLayer) [(None, 40)] 0 []

input_2 (InputLayer) [(None, 2048)] 0 []

embedding (Embedding) (None, 40, 300) 2213400 ['input_3[0][0]']

dense (Dense) (None, 300) 614700 ['input_2[0][0]']

lstm (LSTM) (None, 40, 256) 570368 ['embedding[0][0]']

repeat_vector (RepeatVector) (None, 40, 300) 0 ['dense[0][0]']

time_distributed (TimeDistribu (None, 40, 300) 77100 ['lstm[0][0]']
ted)

concatenate_2 (Concatenate) (None, 40, 600) 0 ['repeat_vector[0][0]',
'time_distributed[0][0]']

bidirectional (Bidirectional) (None, 512) 1755136 ['concatenate_2[0][0]']

dense_2 (Dense) (None, 7378) 3784914 ['bidirectional[0][0]']

================================================================================================== Total params: 9,015,618 Trainable params: 9,015,618 Non-trainable params: 0


None steps_train: 94, steps_val: 16 Batch Size: 64 Total Number of Epochs = 20 train_val.py:86: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators. verbose=1) Epoch 1/20 Traceback (most recent call last): File "train_val.py", line 86, in verbose=1) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [3732,1000], In[1]: [2048,300] [[node model_1/dense/Relu (defined at /usr/local/lib/python3.7/dist-packages/keras/backend.py:4867) ]] [Op:__inference_train_function_569695]

Errors may have originated from an input operation. Input Source operations connected to node model_1/dense/Relu: In[0] model_1/dense/BiasAdd (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py:210)

Operation defined at: (most recent call last)

File "train_val.py", line 86, in verbose=1)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator initial_epoch=initial_epoch)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1216, in fit tmp_logs = self.train_function(iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 878, in train_function return step_function(self, iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 867, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,))

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in run_step outputs = model.train_step(data)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 808, in train_step y_pred = self(x, training=True)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 452, in call inputs, training=training, mask=mask)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 589, in _run_internal_graph outputs = node.layer(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py", line 213, in call outputs = self.activation(outputs)

File "/usr/local/lib/python3.7/dist-packages/keras/activations.py", line 311, in relu return backend.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)

File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 4867, in relu x = tf.nn.relu(x)

2022-01-14 04:35:25.329300: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated. [[{{node PyFunc}}]]

hibah321 commented 2 years ago

I've got the same error. Can you help if you've resolved it?

Anant-mishra1729 commented 2 years ago

Got the same error. Help needed!

kunal0230 commented 3 weeks ago

Matrix Size Mismatch: The shape mismatch [3732,1000] vs. [2048,300] likely comes from connecting the InceptionV3 output to the RNN decoder. Double-check that InceptionV3 outputs [None, 2048] for each image, then add a Dense layer or Reshape layer to align it with the decoder’s input.

Callback and Model Saving Issue: Since fit_generator is deprecated, switch to fit, which should work seamlessly with data generators. Also, make sure ModelCheckpoint or other callbacks are properly configured and compatible with TensorFlow’s current version, especially for saving the model.

CUDA Warning (No GPU Detected): This message means training is running on CPU instead of GPU. If a GPU is available, use tf.config.list_physical_devices('GPU') to verify its accessibility, or check your environment settings to ensure TensorFlow can detect it.