apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

turicreate.toolkits._main.ToolkitError:Could not export model: validator error: #3369

Open wbw24 opened 3 years ago

wbw24 commented 3 years ago

I need to train an object detection model. Before that, I tried to run the official demo of turicreate. So I downloaded the official training image set.

$ mkdir -p ~/Downloads/ig02 $ cd ~/Downloads/ig02 $ curl https://lear.inrialpes.fr/people/marszalek/data/ig02/ig02-v1.0-bikes.zip > bikes.zip $ curl https://lear.inrialpes.fr/people/marszalek/data/ig02/ig02-v1.0-cars.zip > cars.zip $ unzip bikes.zip $ mv readme.txt readme-bikes.txt $ unzip cars.zip $ rm bikes.zip cars.zip

Then I used the official sframe generation method and used the official modeling code. Some details of the system I am working on: OS: macOS catalina 10.15.6 Graphics card: 10.0Intel UHD Graphics 630 1536 MB Tensorflow: 2.1.0 Turicreate: 6.4.1 RAM: 16GB I am using a virtual environment set up Finally got this error message

| 9995 | 1.99276 | 7h 25m | | 10000 | 2.01482 | 7h 25m | +--------------+--------------+--------------+ Traceback (most recent call last): File "ImageTrainingCoreMl.py", line 100, in trainningCoreML() File "ImageTrainingCoreMl.py", line 97, in trainningCoreML model.export_coreml('MyCustomObjectDetector.mlmodel') File "/Users/bilibili/Library/Python/2.7/lib/python/site-packages/turicreate/toolkits/object_detector/object_detector.py", line 425, in export_coreml filename, short_description, additional_user_defined_metadata, options File "/Users/bilibili/Library/Python/2.7/lib/python/site-packages/turicreate/extensions.py", line 305, in ret = lambda *args, **kwargs: self.run_class_function(name, args, kwargs) File "/Users/bilibili/Library/Python/2.7/lib/python/site-packages/turicreate/extensions.py", line 293, in run_class_function raise _ToolkitError(exc) turicreate.toolkits._main.ToolkitError: Could not export model: validator error: Batchnorm layer 'batchnorm3_fwd' parameters have values for both full and half precision. Parameters should either be specified in half or full precision, mixed parameters are not supported.

I tried to customize max_iterations, I found that when I defined max_iterations to 3000, the model can be successfully built, and when it exceeds 4000, a parameter error will be reported. I think the official data will report errors. I am a little bit unconfident in training my own model, and the model effect of 3000 iterations is not very good.

Any feedback on this would be appreciated. Thank you!

TobyRoseman commented 3 years ago

I just tried running running this on macOS 10.15.6 with 16GB of RAM. It worked fine for me with 8,000 iterations.

Are you seeing this same error consistently?

wbw24 commented 3 years ago

@TobyRoseman Yes, I've always had this problem, and I've started to train my own models, and it happens every time I'm over 4000 iterations

TobyRoseman commented 3 years ago

I just tried again and I'm not able to reproduce this issue. I'm using macOS 10.15.6 with 16GB of RAM with Tensorflow 2.1.0, Turicreate 6.4.1 and Python 2.7. It was able to run fine for 8000 iterations.

Are you using your own data or are you using the ig02.sframe from the example?

You said you've run this several times and it fails consistently, are you using the same training set each time or are you calling data.random_split each time?