Closed generic-beat-detector closed 5 months ago
Can you please install the latest release, 5.1.0? Also, you have far too few data, try with around 1000 records and not 14. My suspicion is that there is a problem if you don't even have a single full sized batch in neither train nor validation set.
@DocGarbanzo
Yes sir, going with your recommendation to install donkeycar v5.1.0 (which requires python >=3.11
and <=3.12
), the training -- apparently -- succeeds (even with my 17 image test dataset):
Installed miniconda virtual environment
$ wget https://repo.anaconda.com/miniconda/Miniconda3-py39_23.3.1-0-Linux-x86_64.sh
$ bash ./Miniconda3-py39_23.3.1-0-Linux-x86_64.sh
$ eval "$(/home/USER/dev-donkey/miniconda3/bin/conda shell.bash hook)"
$ conda create -n donkey python=3.11
$ conda activate donkey
$ python --version
Python 3.11.9
$PWD
$ unzip donkeycar-5.1.0.zip
$ cd donkeycar-5.1.0/
$ pip install -e .[pc]
$ cd ..
$ donkey createcar --path $PWD/mycar
$ cd mycar/
$ ls
calibrate.py config.py data logs manage.py models myconfig.py train.py
Copied the very same 17 image dataset to data
, then
$ donkey train --tub data
using donkey v5.1.0 ...
[...]
INFO:donkeycar.pipeline.types:Loading tubs from paths ['data']
INFO:donkeycar.pipeline.training:Records # Training 13
INFO:donkeycar.pipeline.training:Records # Validation 4
[...]
INFO:donkeycar.parts.keras:////////// Starting training //////////
Epoch 1/100
2024-06-05 00:48:34.244384: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inlinear/dropout/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
2024-06-05 00:48:34.517973: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2024-06-05 00:48:35.751589: I external/local_xla/xla/service/service.cc:168] XLA service 0x7d6ab00145b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-06-05 00:48:35.751618: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3070, Compute Capability 8.6
2024-06-05 00:48:35.755479: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1717537715.808544 643092 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
1/1 [==============================] - ETA: 0s - loss: 0.4434 - n_outputs0_loss: 0.0248 - n_outputs1_loss: 0.4187
Epoch 1: val_loss improved from inf to 0.46197, saving model to /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel
INFO:tensorflow:Assets written to: /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel/assets
1/1 [==============================] - 6s 6s/step - loss: 0.4434 - n_outputs0_loss: 0.0248 - n_outputs1_loss: 0.4187 - val_loss: 0.4620 - val_n_outputs0_loss: 0.0269 - val_n_outputs1_loss: 0.4351
Epoch 2/100
1/1 [==============================] - ETA: 0s - loss: 0.3613 - n_outputs0_loss: 0.0248 - n_outputs1_loss: 0.3364
Epoch 2: val_loss improved from 0.46197 to 0.31188, saving model to /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel
INFO:tensorflow:Assets written to: /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel/assets
1/1 [==============================] - 1s 794ms/step - loss: 0.3613 - n_outputs0_loss: 0.0248 - n_outputs1_loss: 0.3364 - val_loss: 0.3119 - val_n_outputs0_loss: 0.0225 - val_n_outputs1_loss: 0.2893
Epoch 3/100
1/1 [==============================] - ETA: 0s - loss: 0.2427 - n_outputs0_loss: 0.0228 - n_outputs1_loss: 0.22002024-06-05 00:48:40.575992: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: KeyError: 109
Traceback (most recent call last):
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 290, in finalize_py_func
generator_state.iterator_completed(iterator_id)
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 870, in iterator_completed
del self._iterators[self._normalize_id(iterator_id)]
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 109
2024-06-05 00:48:40.576057: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: UNKNOWN: KeyError: 109
Traceback (most recent call last):
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 290, in finalize_py_func
generator_state.iterator_completed(iterator_id)
File "/home/USER/dev-donkey/miniconda3/envs/donkey/lib/python3.11/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 870, in iterator_completed
del self._iterators[self._normalize_id(iterator_id)]
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 109
[[{{node PyFunc}}]]
Epoch 3: val_loss improved from 0.31188 to 0.08692, saving model to /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel
INFO:tensorflow:Assets written to: /home/USER/dev-donkey/mycar/models/pilot_24-06-05_0.savedmodel/assets
1/1 [==============================] - 1s 790ms/step - loss: 0.2427 - n_outputs0_loss: 0.0228 - n_outputs1_loss: 0.2200 - val_loss: 0.0869 - val_n_outputs0_loss: 0.0067 - val_n_outputs1_loss: 0.0802
Epoch 4/100
1/1 [==============================] - ETA: 0s - loss: 0.1607 - n_outputs0_loss: 0.0319 - n_outputs1_loss: 0.1288
Epoch 4: val_loss did not improve from 0.08692
1/1 [==============================] - 0s 145ms/step - loss: 0.1607 - n_outputs0_loss: 0.0319 - n_outputs1_loss: 0.1288 - val_loss: 0.0976 - val_n_outputs0_loss: 0.0024 - val_n_outputs1_loss: 0.0952
Epoch 5/100
1/1 [==============================] - ETA: 0s - loss: 0.1222 - n_outputs0_loss: 0.0187 - n_outputs1_loss: 0.1035
Epoch 5: val_loss did not improve from 0.08692
1/1 [==============================] - 0s 160ms/step - loss: 0.1222 - n_outputs0_loss: 0.0187 - n_outputs1_loss: 0.1035 - val_loss: 0.1443 - val_n_outputs0_loss: 0.0027 - val_n_outputs1_loss: 0.1415
Epoch 6/100
1/1 [==============================] - ETA: 0s - loss: 0.1274 - n_outputs0_loss: 0.0132 - n_outputs1_loss: 0.1142
Epoch 6: val_loss did not improve from 0.08692
1/1 [==============================] - 0s 140ms/step - loss: 0.1274 - n_outputs0_loss: 0.0132 - n_outputs1_loss: 0.1142 - val_loss: 0.1452 - val_n_outputs0_loss: 0.0022 - val_n_outputs1_loss: 0.1431
Epoch 7/100
1/1 [==============================] - ETA: 0s - loss: 0.1173 - n_outputs0_loss: 0.0111 - n_outputs1_loss: 0.1062
Epoch 7: val_loss did not improve from 0.08692
1/1 [==============================] - 0s 139ms/step - loss: 0.1173 - n_outputs0_loss: 0.0111 - n_outputs1_loss: 0.1062 - val_loss: 0.1188 - val_n_outputs0_loss: 0.0017 - val_n_outputs1_loss: 0.1170
Epoch 8/100
1/1 [==============================] - ETA: 0s - loss: 0.1270 - n_outputs0_loss: 0.0182 - n_outputs1_loss: 0.1089
Epoch 8: val_loss did not improve from 0.08692
1/1 [==============================] - 0s 140ms/step - loss: 0.1270 - n_outputs0_loss: 0.0182 - n_outputs1_loss: 0.1089 - val_loss: 0.0971 - val_n_outputs0_loss: 0.0020 - val_n_outputs1_loss: 0.0951
INFO:donkeycar.parts.keras:////////// Finished training in: 0:00:08.441838 //////////
[...]
2024-06-05 00:48:45.071498: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 66754 microseconds.
[...]
INFO:donkeycar.parts.interpreter:TFLite conversion done.
INFO:donkeycar.pipeline.database:Writing database file: /home/USER/dev-donkey/mycar/models/database.json
... a few errors but the training process completed (early due to "no improvment in validation loss" -- my bogus dataset ;), and everthing looks A-okay. I'll have to test with a real dataset of course, but at least the (compatibility) issues seem to have been fixed!
Thank you sir.
@DocGarbanzo,
Hi! So far, so good. I've just trained a model on a
$ ls -l data/images/ | wc -l
13960
image dataset, and it's quite lovely. The autopilot has completed several runs like a champ!
I could swear I previously run into an issue (seemingly Python v3.11 related) with donkey ui
(a "recursion depth exceeded" type error) but I mysteriously cannot reproduce it. In any case, it is not a priority right now. I will let you know of any problems in another thread. Thanks once again.
@DocGarbanzo,
Hi!
So far, so good. I've just trained a model on a
$ ls -l data/images/ | wc -l 13960
image dataset, and it's quite lovely. The autopilot has completed several runs like a champ!
I could swear I previously run into an issue (seemingly Python v3.11 related) with
donkey ui
(a "recursion depth exceeded" type error) but I mysteriously cannot reproduce it. In any case, it is not a priority right now. I will let you know of any problems in another thread. Thanks once again.
Ok, great. Thanks for confirming. The TF key error still is a bit concerning. We'll have an eye on it if that ever shows up again.
Hello!
FWIW, this is truly a wonderful project!
Unfortunately, my limited skills can't even ### seem to get
donkey train --tub data
to work on Ubuntu 22.04 x86-64. The command also fails on RPi 4B bookworm but somehow works on the robocarstore RPi 4B pre-built-image @v5.0-dev3?For the PC installs, I followed the (variations of) instructions here, there, and there
I'm using the same exact dataset in all scenarios:
Ubuntu 22.04, x86-64 (w/ RTX 3070)
Python 3.10.12
Python-3.9.19
RPi 4B
Bookworm, Python 3.11.2
$ python --version Python 3.11.2
$ donkey --version using donkey v5.1.0 ...
$ donkey train --tub data [...] INFO:donkeycar.pipeline.types:Loading tubs from paths ['data'] INFO:donkeycar.pipeline.training:Records # Training 13 INFO:donkeycar.pipeline.training:Records # Validation 4 INFO:donkeycar.parts.tub_v2:Closing tub data [...] INFO:donkeycar.parts.keras:////////// Starting training ////////// Epoch 1/100 2024-05-29 01:17:03.203388: W tensorflow/core/framework/op_kernel.cc:1827] INVALID_ARGUMENT: ValueError: Key image is not in available keys. Traceback (most recent call last):
File "/home/pi/projects/donkeycar/env/lib/python3.11/site-packages/tensorflow/python/ops/script_ops.py", line 270, in call ret = func(*args) ^^^^^^^^^^^
$ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye
$ python --version Python 3.9.2
$ donkey --version using donkey v5.0.dev3 ...
$ donkey train --tub data [...] INFO:donkeycar.pipeline.types:Loading tubs from paths ['data'] INFO:donkeycar.pipeline.training:Records # Training 13 INFO:donkeycar.pipeline.training:Records # Validation 4 INFO:donkeycar.parts.tub_v2:Closing tub data INFO:donkeycar.parts.image_transformations:Creating ImageTransformations [] INFO:donkeycar.parts.image_transformations:Creating ImageTransformations [] INFO:donkeycar.parts.image_transformations:Creating ImageTransformations [] INFO:donkeycar.parts.image_transformations:Creating ImageTransformations [] INFO:donkeycar.pipeline.training:Train with image caching: True INFO:donkeycar.parts.keras:////////// Starting training ////////// Epoch 1/100 1/1 [==============================] - ETA: 0s - loss: 0.4888 - n_outputs0_loss: 0.0301 - n_outputs1_loss: 0.4587 Epoch 1: val_loss improved from inf to 0.14395, saving model to /home/pi/mycar/models/pilot_24-05-29_0.savedmodel [...] 1/1 [==============================] - 16s 16s/step - loss: 0.4888 - n_outputs0_loss: 0.0301 - n_outputs1_loss: 0.4587 - val_loss: 0.1440 - val_n_outputs0_loss: 0.0115 - val_n_outputs1_loss: 0.1324 Epoch 2/100 1/1 [==============================] - ETA: 0s - loss: 0.2921 - n_outputs0_loss: 0.0200 - n_outputs1_loss: 0.2721 Epoch 2: val_loss did not improve from 0.14395 1/1 [==============================] - 6s 6s/step - loss: 0.2921 - n_outputs0_loss: 0.0200 - n_outputs1_loss: 0.2721 - val_loss: 0.2925 - val_n_outputs0_loss: 0.0241 - val_n_outputs1_loss: 0.2683