Azure / azure-stream-analytics

Azure Stream Analytics
MIT License
224 stars 935 forks source link

Stream Analytics on Edge restart engine to pick up new reference data #142

Open Maci3jPy opened 2 years ago

Maci3jPy commented 2 years ago

I am working on Stream Analytics deployed on Edge machine. It's using reference data that is deployed in the volume and binded to Stream analytics container. It's working fine. But I encountered an issue when i try to update the reference data. As per documentation:

You can update the reference data in two ways:

Update the reference data path in your Stream Analytics job from the Azure portal.
Update the IoT Edge deployment.

So I updated the file and then i updated the edge manifest through Azure portal. In fact i didn't change anything i just created new deployment. After that i checked if Stream Analytics got that information. And it seems that it did but it also caused the engine to restart. It is unacceptable as it will occur in data loss for almost 2 minutes. As per documentation it seems that it shouldn't restart.

Reference data on an IoT Edge update is triggered by a deployment. After it's triggered, the Stream Analytics module picks the updated data without stopping the running job.

Here is what I see in logs after deployment: (i deleted link with company names)

2022-07-06 13:04:22.795 +00:00 [INF] - Desired Property changed: {"ASAJobInfo":"=r","ASAJobResourceId":"/PublishTimestamp":"7/6/2022 1:04:05 PM","$version":21}
2022-07-06 13:04:22.795 +00:00 [INF] - Job plan was updated, re-initialize node.
2022-07-06 13:04:22.795 +00:00 [INF] - Stopping tasks.
2022-07-06 13:04:22.871 +00:00 [INF] - Metrics is Cancelled.
2022-07-06 13:04:22.871 +00:00 [INF] - Unsetting assembly resolver for query runtime binaries.
2022-07-06 13:04:22.874 +00:00 [INF] - Node Exited.
2022-07-06 13:04:22.874 +00:00 [INF] - Will restart engine in one minute.
2022-07-06 13:05:22.875 +00:00 [INF] - Module client shutting down... 
2022-07-06 13:05:22.876 +00:00 [INF] - Module client initializing ... 
2022-07-06 13:05:24.552 +00:00 [INF] - Opened module client connection.
2022-07-06 13:05:24.553 +00:00 [INF] - Setting Product Info Id as Microsoft.stream-analytics-on-iot-edge.
2022-07-06 13:05:25.693 +00:00 [INF] - ASA getting path: https://
2022-07-06 13:05:25.695 +00:00 [INF] - Download ASA Job Package ...
2022-07-06 13:05:30.827 +00:00 [INF] - Unzip /tmp/2364eaf2-9c32-4b87-aa70-eafd682640ae/ASAEdgeJobDefinition.zip ...
2022-07-06 13:05:31.048 +00:00 [INF] - Read job definition from /tmp/76dea73d-1c10-4986-8b05-6c17450b761b/EdgeJobDefinition.txt.
2022-07-06 13:05:31.048 +00:00 [INF] - Read job configuration from /tmp/76dea73d-1c10-4986-8b05-6c17450b761b/EdgeJobConfiguration.txt.
2022-07-06 13:05:31.048 +00:00 [INF] - Initalizing ASA Engine ...
2022-07-06 13:05:31.048 +00:00 [INF] - The directory for the executing assembly is /app.
2022-07-06 13:05:31.048 +00:00 [INF] - The directory for the generated query is /tmp/76dea73d-1c10-4986-8b05-6c17450b761b/QueryBinaries.
2022-07-06 13:05:31.048 +00:00 [INF] - Load codegen dll :76dea73d-1c10-4986-8b05-6c17450b761b__GeneratedQueryCode__.
2022-07-06 13:05:31.048 +00:00 [INF] - Setting assembly resolver for query runtime binaries. Folder location is :/tmp/76dea73d-1c10-4986-8b05-6c17450b761b/QueryBinaries.
xitia commented 2 years ago

Thanks for reporting the case. The behavior is expected, for ASA edge module, we currently only support static ref data, to update existing ref data, you will need to do an iotedge deployment with the latest ref data path, it will not trigger container restart but instead cause engine reboot to pick up the latest data.

Maci3jPy commented 2 years ago

Thanks for the answer. This is how I was doing it (updating data). But will the engine restart cause the loss of the data? If yes then I would say the documentation is quite unclear about it as it states that it picks up new ref data without stopping the job.

xitia commented 2 years ago

Thanks for pointing it out, we will update the doc to avoid confusion. During engine reboot, the input which has been calculated in memory but hasn't been produced as output will be deleted.

djl-85 commented 2 years ago

Hi, Can anyone explain me how to add reference data file in a stream analytics edge job? The docs lacks information/examples about that. When I try to add a reference data file in the job and deploy it into the edge device, the stream analytics job do not run. I mean, in the portal I write the file path in the edge machine as the image below shows.

imagen

Do I have to create a volume for docker and bind it between ASA container and the edge machine?

Thank you.