STMicroelectronics / stm32ai-modelzoo

AI Model Zoo for STM32 devices
Other
236 stars 64 forks source link

training handposture #14

Closed 141391 closed 9 months ago

141391 commented 9 months ago

Hi! I got the following error when trying to train handposture, and I haven't been able to find a solution. The data set I used is the compressed data set package in the original project, and basically no changes were made to the code. The specific error reported is as follows: Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:** The error location code is as follows: print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)

The relevant configuration is as follows: train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2

LFOSTM commented 9 months ago

Hello, We will have a look at it soon. Do you reproduce the issue without doing any change in the code?

141391 commented 9 months ago

Hello, We will have a look at it soon. Do you reproduce the issue without doing any change in the code?

Hello I haven't made any changes to the code. I just configured the relevant environment and changed the file path of cubeAI. Unfortunately, I haven't solved this aspect yet.

141391 commented 9 months ago

Hello, We will have a look at it soon. Do you reproduce the issue without doing any change in the code?

Hello! I have solved this problem. The reason why I have this problem is that my protobuf version is too high. I originally lowered the version to 3.20.1 with reference, but it didn't seem to work. I just downgraded the version to 3.19.0. The model is finally ready to train! ! ! Thank you very much for your reply! I will continue to try to deploy to the hardware. Thank you!

LFOSTM commented 9 months ago

Great! Don't hesitate to share the outcome of your training and deployment. I will close the issue in a couple of days.