cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.7k stars 3.02k forks source link

Unable to load model onto CVAT for auto-annotation #1489

Closed himalayanZephyr closed 4 years ago

himalayanZephyr commented 4 years ago

Hi. I'm trying to load a YOLO model onto CVAT for auto annotation and getting the following error in the browser console:

Checking request has returned the "failed" status. Message: Exception: Model was not properly created/updated. Test failed: IndexError at line 76: index 148720 is out of bounds for axis 0 with size 137904

Also, tried to debug by using run_model.py from cvat/utils/auto_annotation. I added IE_PLUGINS_PATH="/opt/intel/openvino/inference_engine/lib/intel64" in .bashrc to run this script but getting the following error

RuntimeError: Cannot find plugin to use: Tried load plugin : MKLDNNPlugin for device CPU, error: Plugin MKLDNNPlugin cannot be loaded: cannot load plugin: MKLDNNPlugin from /opt/intel/openvino/inference_engine/lib/intel64: Cannot load library '/opt/intel/openvino/inference_engine/lib/intel64/libMKLDNNPlugin.so': /opt/intel/openvino/inference_engine/lib/intel64/libMKLDNNPlugin.so: undefined symbol: _ZN3tbb8internal13numa_topology11nodes_countEv, skipping

Any ideas, why the model is failing when I'm trying to upload it onto the CVAT? I'll be happy to share XML, bin and mapping files.

benhoff commented 4 years ago

These look like two different errors. What steps did you take to convert the yolo model? Can you post the logs for cvat?

docker logs cvat

himalayanZephyr commented 4 years ago

Hi @benhoff

I trained a custom darknet model and converted to Yolov3 using the openvino guide

Here are the docker logs

return IENetwork(model = model, weights = weights) 09:04:42 Exception: Model was not properly created/updated. Test failed: IndexError at line 76: index 148720 is out of bounds for axis 0 with size 137904 Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job rv = job.perform() File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform self._result = self._execute() File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute return self.func(*self.args, *self.kwargs) File "/home/django/cvat/apps/auto_annotation/model_manager.py", line 118, in _update_dl_model_thread raise Exception("Model was not properly created/updated. Test failed: {}".format(message)) Exception: Model was not properly created/updated. Test failed: IndexError at line 76: index 148720 is out of bounds for axis 0 with size 137904 Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job rv = job.perform() File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform self._result = self._execute() File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute return self.func(self.args, **self.kwargs) File "/home/django/cvat/apps/auto_annotation/model_manager.py", line 118, in _update_dl_model_thread raise Exception("Model was not properly created/updated. Test failed: {}".format(message)) Exception: Model was not properly created/updated. Test failed: IndexError at line 76: index 148720 is out of bounds for axis 0 with size 137904

The first error is when I try to upload the model on CVAT GUI.

benhoff commented 4 years ago

Wondering if it's a conversion issue or a cvat issue. Anyway you could plug your model into this?

https://github.com/opencv/open_model_zoo/tree/master/demos/python_demos/object_detection_demo_yolov3_async

himalayanZephyr commented 4 years ago

@benhoff Didn't get any errors while converting though. Anyways, let me get back to you after trying what you've suggested. [UPDATE] Getting this error after trying the above:

[ INFO ] Creating Inference Engine... [ INFO ] Loading network files: yolo_files/frozen_darknet_yolov3_model.xml yolo_files/frozen_darknet_yolov3_model.bin Traceback (most recent call last): File "object_detection_demo_yolov3_async.py", line 362, in sys.exit(main() or 0) File "object_detection_demo_yolov3_async.py", line 188, in main supported_layers = ie.query_network(net, "CPU") File "ie_api.pyx", line 188, in openvino.inference_engine.ie_api.IECore.query_network RuntimeError: Failed to create plugin /opt/intel/openvino_2020.1.023/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so for device CPU Please, check your environment Cannot load library '/opt/intel/openvino_2020.1.023/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so': /opt/intel/openvino_2020.1.023/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so: undefined symbol: _ZN3tbb8internal13numa_topology11nodes_countEv

The .so file though exists at that path.

benhoff commented 4 years ago

This sounds like a OpenVINO issue. Have you seen this issue?

https://github.com/openvinotoolkit/openvino/issues/473

himalayanZephyr commented 4 years ago

@benhoff I was assuming that the following error was because of some issue in my interpretation .py script

File "/home/django/cvat/apps/auto_annotation/model_manager.py", line 118, in _update_dl_model_thread raise Exception("Model was not properly created/updated. Test failed: {}".format(message)) Exception: Model was not properly created/updated. Test failed: IndexError at line 76: index 148720 is out of bounds for axis 0 with size 137904

Anyways, I'll look at this thread and see if that helps in some way.

benhoff commented 4 years ago

I'm keying in on this piece of your error message:

undefined symbol: _ZN3tbb8internal13numa_topology11nodes_countEv

This looks like an error with your install that is answered in the linked thread. I hope this resolves your issue

himalayanZephyr commented 4 years ago

Hi @benhoff I'm getting inference results now with https://github.com/opencv/open_model_zoo/tree/master/demos/python_demos/object_detection_demo_yolov3_async

I reinstalled the openvino but was still getting the following error

IndexError: index 37180 is out of bounds for axis 0 with size 34476 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run_model.py", line 263, in main() File "run_model.py", line 185, in main restricted=restricted) File "/home/user/softwares/cvat/utils/auto_annotation/../../cvat/apps/auto_annotation/inference.py", line 144, in run_inference_engine_annotation processed_detections = _process_detections(detections, convertation_file, restricted=restricted) File "/home/user/softwares/cvat/utils/auto_annotation/../../cvat/apps/auto_annotation/inference.py", line 30, in _process_detections execute_python_code(source_code, global_vars, local_vars) File "/home/user/softwares/cvat/utils/auto_annotation/../../cvat/apps/engine/utils.py", line 60, in execute_python_code raise InterpreterError("{} at line {}: {}".format(error_class, line_number, details)) cvat.apps.engine.utils.InterpreterError: IndexError at line 76: index 37180 is out of bounds for axis 0 with size 34476

Then I inspected my interp.py

I was initially using num=9 in interp.py (based on /deployment_tools/model_optimizer/extensions/front/tf/yolo_v3.json and darknet yolov3 cfg) but then changed it to 3 and the model successfully loaded on CVAT and is showing inference too.

Now I wanted to understand if num should always be 3? Just an added detail is that while generating IR, the json file still had num=9 and I just changed it to num=3 in the interpretation script.

benhoff commented 4 years ago

Ah good question! I had forgotten about those numbers. The last time I dove into yolo to write the intrept script, I found the basis of those numbers to be confusing. I do believe they will always be consistent so you're likely fine leaving them

himalayanZephyr commented 4 years ago

@benhoff I just modified the classes variable in the script equal to the number of classes in my custom model. Let me know about the intuition behind num variable whenever you can. And does this mean that the .json file specified during the IR generation doesn't matter?

bsekachev commented 4 years ago

The issue isn't relevant any more. In new versions we are using nuclio to deploy new models.

GoGoPen commented 4 years ago

I am encountering the same problem.

I initialized the CVAT with the following command

docker-compose -f docker-compose.yml -f components/analytics/docker-compose.analytics.yml -f components/serverless/docker-compose.serverless.yml up -d --build

Then, I have used nuctl-1.4.17-linux-amd64 and nuctl-1.1.37-linux-amd64 to build the automatic annotation tool. For example,

./nuctl-1.4.17-linux-amd64 deploy --project-name cvat \
    --path serverless/openvino/dextr/nuclio \
    --volume `pwd`/serverless/openvino/common:/opt/nuclio/common \
    --platform local

It is properly built and running in nuclio. However, when I try to run the inference.

The following error message showed in the CVAT,

Fetching inference status for the task 2
Error: Inference status for the task 2 is failed. requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://nuclio:8070/api/function_invocations

If I check the detail of the nuclio, you will find that the source code in those function are empty.

nmanovic commented 4 years ago

@NexploreKelvinChiu , could you please create a separate issue. The issue was closed and based on different implementation of the functionality. Also to the new issue attach output of docker ps , nuctl get functions, and docker logs <container for the dextr serverless function>

GoGoPen commented 4 years ago

@NexploreKelvinChiu , could you please create a separate issue. The issue was closed and based on different implementation of the functionality. Also to the new issue attach output of docker ps , nuctl get functions, and docker logs <container for the dextr serverless function>

Thanks. I will create a separate issue.