Azure / AI-PredictiveMaintenance

Creative Commons Attribution 4.0 International
137 stars 74 forks source link

How this AI-Predictive Maintenance workflow is working ? #143

Open girishkumarbk opened 6 years ago

girishkumarbk commented 6 years ago

Hi,

Is there a documentation as to how this predictive-maintenance is working ? I mean who starts the device generator ? how the data is flowing into ABS ? and then who is starting the spark to read the data from ABS ? what processing is being done ? where and how the data is stored etc ?

The documentation in the github indicates the overall component interaction and the flow but it doesn't tell how things start off ? where is the data moving ? how it is synchronized ? where is the data generator and offline maintenance events being used ?

If I have to recreate this workflow manually from github src would it be possible ?

Regards, /Girish BK

wdecay commented 6 years ago

Hi Girish,

Thanks for your interest. Let me try and update the documentation to address these questions. I will get back to you when I have something ready.

-andrew

wdecay commented 6 years ago

Hi Girish,

I have updated the Developers' Manual doc. Hope it provides enough context to understand how the solution is implemented.

https://github.com/Azure/AI-PredictiveMaintenance/blob/master/docs/Developer-Manual.md

Please let me know what you think.

Thanks, -andrew

girishkumarbk commented 6 years ago

Data sent to IoT Hub by the Generator is read (using Azure Event Hub Connector) and processed by solution's Spark Structured Streaming Job running on the Databricks cluster created during solution provisioning.

Andrew: A quick question: Along with data being routed to Azure Event Hub (via routes) we also notice that the data from device is also routed to Azure Blob Container. If Spark picks its data from event hub ( vi the spark connector for event hub) then when would the data from Azure blob store used? I don't see a corresponding workflow that operates on telemetry data on Azure Blob Store ? Or is it that data on ABS is used for batch training again via spark ? Could you please elaborate why data from IoT hub is being routed to two routes and what's the workflow for each one of them ?

wdecay commented 6 years ago

Hi Girish,

This diagram would, perhaps, be the best answer to your question: https://github.com/Azure/AI-PredictiveMaintenance/blob/master/docs/img/data_flow.png

The "snapshot" is the data accumulated on ABS. It is not used in the production pipeline (the right side of the diagram), but its purpose is to enable modeling. We provide an example notebook for ingesting this stapshot data along with failure records from Storage Tables. This imitates a production scenario where telemetry is collected over a period of time whereas failure/maintenance logs are manually populated with new data.

The DataGeneration notebook provides a "shortcut" for generating seed data, but in reality, you would need to collect that data from your machines over a sufficiently long period of time (perhaps, months at least).

Hope this explains it...

girishkumarbk commented 6 years ago

Hi Andrew,

When I deploy the solution via pdm-arm.json it is successful and then then I can see the health of the machines on dashboard. The streaming (real time path) works and health of the machines is displayed.

Where as I don't see the training flow never starts ??? Meaning the data pushed into ABS and then picked for retraining the model via DataIngestion, Featuring and Model Creation and operaitiolization workflow never runs. I don't see notebooks going into running state any time ?

How do we make sure that the entire pipeline from streaming data insights to training the model works ?

Also it would be great if you could add section on how to build this entire project that would enable us to make modifications and redeploy and customize for our own flow.

Regards, /Girish BK