Azure / KAN

KubeAI Application Nucleus for edge is a solution accelerator for creating, deploying, and operating environment-aware solutions at scale that use artificial intelligence (AI) at the edge with the control and flexibility of open-source natively on your environment.
MIT License
77 stars 22 forks source link

The deployment isn't modified when the associated AI skill is updated #42

Open BaoxiJia opened 2 years ago

BaoxiJia commented 2 years ago
  1. Create an AI Skill with Model A.
  2. Create a deployment with this AI skill, make sure the deployment works well on the Edge device.
  3. Update the AI skill to use Model B, save the AI skill and get the notification "Changing this skill will modify your deployments that have a reference to this skill."

Check the deployment running on the Edge device, I can see it still inferences using Model A, while Model B is expected. To make the updated AI skill work, I have to delete the current deployment and recreate it with the updated AI skill.

penorouzi commented 2 years ago

@BaoxiJia We fixed this in the latest release. Did you use 0.38.1 installer?

BaoxiJia commented 2 years ago

@penorouzi I used 0.38.0 for testing.

penorouzi commented 2 years ago

@BaoxiJia Can you use the latest version and see if you still have the issue?

BaoxiJia commented 2 years ago

I tried the latest 0.38.1, the updated model couldn't take effect until I rebooted PredictModule manually on Edge side. Here is what I found -

  1. Create an AI Skill with the model "pedestrian-and-vehicle-detector" from model zoo.
  2. Create a deployment with this AI skill, make sure the deployment works well on the Edge device.
  3. Update the AI skill to use my custom model "car-detector", save the AI skill and update the deployment.

What I found:

  1. ManagerModule rebooted with the following docker logs, which meaned the P4E edge modules were informed of the deployment update.
["skill-13580f31-719e-45e5-90ec-2347c6c54aef as skill-d92d"]
parameters={'configure_data': '{"cam-cpu": ["car-skill-cpu"]}', 'skill-d92d.device_displayname': 'cam-cpu', 'skill-d92d.device_id': 'device-96913022-58a8-40b0-872a-121cc5f78ffc', 'skill-d92d.fps': '15', 'skill-d92d.instance_displayname': 'deploy-car-cpu', 'skill-d92d.rtsp': 'rtsp://:@10.172.88.62/media/camera-300s.mkv', 'skill-d92d.skill_displayname': 'car-skill-cpu'} solution='solution-01cc9761-2fc6-4947-a3fe-72c4058724e4' target=InstanceTarget(name='target-cb6f2bb4-882f-475d-aee9-94e5083e5f5b')
ok
skill_name: skill-13580f31-719e-45e5-90ec-2347c6c54aef, skill_alias: skill-d92d
{'apiVersion': 'ai.symphony/v1', 'kind': 'Skill', 'metadata': {'creationTimestamp': '2022-09-02T04:56:10Z', 'generation': 9, 'name': 'skill-13580f31-719e-45e5-90ec-2347c6c54aef', 'namespace': 'default', 'resourceVersion': '6810096', 'uid': 'dcc1893c-5f60-4232-b476-6e62cb9e9ef2'}, 'spec': {'displayName': 'car-skill-cpu', 'edges': [{'source': {'node': '0', 'route': 'f'}, 'target': {'node': '1', 'route': 'f'}}, {'source': {'node': '1', 'route': 'f'}, 'target': {'node': '2', 'route': 'f'}}, {'source': {'node': '1', 'route': 'f'}, 'target': {'node': '3', 'route': 'f'}}], 'nodes': [{'configurations': {'device_name': 'device-96913022-58a8-40b0-872a-121cc5f78ffc', 'fps': '15', 'ip': 'rtsp://:@10.172.88.62/media/camera-300s.mkv'}, 'id': '0', 'name': 'rtsp', 'type': 'source'}, {'configurations': {'confidence_lower': '0', 'confidence_upper': '0', 'max_images': '0'}, 'id': '1', 'name': 'model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e', 'type': 'model'}, {'configurations': {'delay_buffer': '2', 'device_displayname': 'cam-cpu', 'filename_prefix': 'hcicpu', 'insights_overlay': 'true', 'instance_displayname': 'deploy-car-cpu', 'recording_duration': '10', 'skill_displayname': 'car-skill-cpu'}, 'id': '2', 'name': 'video_snippet_export', 'type': 'export'}, {'configurations': {'delay_buffer': '30'}, 'id': '3', 'name': 'iothub_export', 'type': 'export'}], 'parameters': {'accelerationRetrieve': 'CPU', 'device_displayname': 'invalid', 'device_id': 'invalid', 'fps': 'invalid', 'fpsRetrieve': '15', 'instance_displayname': 'invalid', 'rtsp': 'invalid', 'skill_displayname': 'invalid'}}}
s --> properties=ModelProperties(model_type='customvision', model_subtype='customvision.ObjectDetection', model_project='392db6ef-966c-46a5-964f-f0845fe72e3f', tags='["person", "car"]', state='trained')
Got Download URI https://irisprodwu2training.blob.core.windows.net:443/m-392db6ef966c46a5964ff0845fe72e3f/cef2a7edbe004e1e9e1230794d1153d0.ONNX.zip?sv=2020-04-08&se=2022-09-03T06%3A04%3A32Z&sr=b&sp=r&sig=z5RhScLjPw86aKyKSUY34CHreSIJ7%2FqvsjP78Vey8nI%3D
------------------------------------------------------------
  Getting status from StreamingModule and PredictModule ...
------------------------------------------------------------
  Predict   Module: StatusEnum.RUNNING
  Streaming Module: StatusEnum.WAITING
------------------------------------------------------------
  sending cascade_configs to predictmodule ...
------------------------------------------------------------
[CascadeConfig(edges=[Edge(source='0', target='1'), Edge(source='1', target='2'), Edge(source='1', target='3')], nodes=[Node(id='0', type='source', name='rtsp', configurations={'device_name': 'device-96913022-58a8-40b0-872a-121cc5f78ffc', 'fps': '15', 'ip': 'rtsp://:@10.172.88.62/media/camera-300s.mkv', 'skill_name': 'skill-13580f31-719e-45e5-90ec-2347c6c54aef'}), Node(id='1', type='model', name='object_detection_model', configurations={'confidence_lower': '0', 'confidence_upper': '0', 'max_images': '0', 'model': 'model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e', 'symphony_name': 'model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e', 'provider': 'customvision'}), Node(id='2', type='export', name='video_snippet_export', configurations={'delay_buffer': '2', 'device_displayname': 'cam-cpu', 'filename_prefix': 'hcicpu', 'insights_overlay': 'true', 'instance_displayname': 'deploy-car-cpu', 'recording_duration': '10', 'skill_displayname': 'car-skill-cpu'}), Node(id='3', type='export', name='iothub_export', configurations={'delay_buffer': '30'})])]
  1. PredictModule rebooted and kept outputting the following logs, but no prediction results were uploaded to IoT hub or file storage.
    INFO:     172.18.0.7:48194 - "POST /predict/model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e?width=300&height=300 HTTP/1.1" 200 OK
  2. Manually rebooted the PredictModule from IoT hub, I could see from PredictModule logs that the custom module was downloaded and took effect, new Prediction results were uploaded to IoT Hub and file storage.
INFO:     172.18.0.7:51008 - "POST /predict/model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e?width=300&height=300 HTTP/1.1" 200 OK
2022-09-02 06:28:04.788613241 [W:onnxruntime:, execution_frame.cc:811 VerifyOutputSizes] Expected shape from model of {None,35,13,13} does not match actual shape of {1,35,16,16} for output model_outputs0
--> modes {'model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e': <customvision_object_detection.CustomVisionObjectDetectionModel object at 0x78fec76d7bb0>}
--> model-c1ff2999-2c53-48b8-9db6-b9a9dfe0bc0e
penorouzi commented 2 years ago

Thank you @BaoxiJia for the detailed info.

@ronpai @waitingkuo Please look at the above issue.