STMicroelectronics / stm32ai-modelzoo

AI Model Zoo for STM32 devices
Other
236 stars 64 forks source link

Issue during training a model: "OSError: Unable to create file (file signature not found)." #5

Closed davidroid closed 1 year ago

davidroid commented 1 year ago

Hello all, I tried to run the training of an image classification model available in the stm32ai-modelzoo, but hit the following issue: "OSError: Unable to create file (file signature not found)."

Shahnawax commented 1 year ago

Hello @davidroid , We were not able to reproduce the issue on our side, however, doing a bit of research it looks like the issue is caused because during training we are using a set of callbacks. One of these callbacks is making sure that at every epoch it saves/updates a model checkpoint with the best validation accuracies. It looks like this file is locked for some reason. I found a similar problem here along with the fix. Could you please try to export HDF5_USE_FILE_LOCKING=FALSE run this command from your terminal and see if it fixes the problem? The details of the solution or what it will do can be found here. In the meanwhile, could you please also tell us what are the versions of your OS, WSL, Python that you are using in order to try to reproduce the issues that you have? Let us know if the solution works, Thank you!

davidroid commented 1 year ago

Hello @Shahnawax, I have run the training again after exporting the variable you suggested, but nothing has changed unfortunately, I got the same error. I have updated the previous comment with the release of the OS, which is Microsoft Windows 10 Enterprise.

Shahnawax commented 1 year ago

The issue is more likely caused due to some right issues of the file checkpoint. Discussed the fix to the reporter, and tested on multiple platforms to confirm that the issue does not exist.