Problem with donkeycar 4.2 training

rhovey commented 3 years ago
Should I go back to 4.1?
Here is the 4.2 command I used from the docs: donkey train --tub=.\\data --model=.\\models\\test.h5 --type=linear
The resulting error I get:
________             ______                   _________
___  __ \_______________  /___________  __    __  ____/_____ ________
__  / / /  __ \_  __ \_  //_/  _ \_  / / /    _  /    _  __ `/_  ___/
_  /_/ // /_/ /  / / /  ,<  /  __/  /_/ /     / /___  / /_/ /_  /
/_____/ \____//_/ /_//_/|_| \___/_\__, /      \____/  \__,_/ /_/
                                 /____/

using donkey v4.2.0 ...
loading config file: ./config.py
loading personal config over-rides from myconfig.py
"get_model_by_type" model Type is: linear
Created KerasLinear
2021-05-21 19:02:50.145487: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
img_in (InputLayer)             [(None, 120, 160, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 58, 78, 24)   1824        img_in[0][0]
__________________________________________________________________________________________________
dropout (Dropout)               (None, 58, 78, 24)   0           conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 27, 37, 32)   19232       dropout[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 27, 37, 32)   0           conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 12, 17, 64)   51264       dropout_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 12, 17, 64)   0           conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 10, 15, 64)   36928       dropout_2[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 10, 15, 64)   0           conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 8, 13, 64)    36928       dropout_3[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 8, 13, 64)    0           conv2d_5[0][0]
__________________________________________________________________________________________________
flattened (Flatten)             (None, 6656)         0           dropout_4[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 100)          665700      flattened[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 100)          0           dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 50)           5050        dropout_5[0][0]
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 50)           0           dense_2[0][0]
__________________________________________________________________________________________________
n_outputs0 (Dense)              (None, 1)            51          dropout_6[0][0]
__________________________________________________________________________________________________
n_outputs1 (Dense)              (None, 1)            51          dropout_6[0][0]
==================================================================================================
Total params: 817,028
Trainable params: 817,028
Non-trainable params: 0
__________________________________________________________________________________________________
None
Using catalog C:\dev\2021\autotrack_train\at06_donkeycar_4.2\data\catalog_46.catalog

Ignoring record at index 3425
Records # Training 37285
Records # Validation 9322
Traceback (most recent call last):
  File "C:\ProgramData\Miniconda3\envs\donkey\Scripts\donkey-script.py", line 33, in <module>
    sys.exit(load_entry_point('donkeycar', 'console_scripts', 'donkey')())
  File "c:\dev\2021\donkeycar\donkeycar\management\base.py", line 500, in execute_from_command_line
    c.run(args[2:])
  File "c:\dev\2021\donkeycar\donkeycar\management\base.py", line 461, in run
    args.comment)
  File "c:\dev\2021\donkeycar\donkeycar\pipeline\training.py", line 148, in train
    show_plot=cfg.SHOW_PLOT)
  File "c:\dev\2021\donkeycar\donkeycar\parts\keras.py", line 158, in train
    use_multiprocessing=False
  File "C:\ProgramData\Miniconda3\envs\donkey\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\ProgramData\Miniconda3\envs\donkey\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 205, in fit
    batch_size, steps_per_epoch, x)
  File "C:\ProgramData\Miniconda3\envs\donkey\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1756, in _validate_or_infer_batch_size
    x, batch_size))
**ValueError: The `batch_size` argument must not be specified for the given input type. Received input: <PrefetchDataset shapes: ({img_in: (None, 120, 160, 3)}, {n_outputs0: (None,), n_outputs1: (None,)}), types: ({img_in: tf.float64}, {n_outputs0: tf.float64, n_outputs1: tf.float64})>, batch_size: 128**
xingruic commented 3 years ago
I got that error when using tensorflow 1.x. tensorflow 2.2.0 doesn't give me that error.
rhovey commented 3 years ago
Thanks for the tip, NevGithub0823!
Indeed, it was a mismatch of tensorflow. I had v2.1.0 on the host PC and v2.2.2 (released version) on the donkeycar. Once both were set to v2.2.1, training came off without a hitch.
autorope / donkeycar

Problem with donkeycar 4.2 training #869