Azure / AzureDataLake

Samples and Docs for Azure Data Lake Store and Analytics
http://aka.ms/AzureDataLake
MIT License
139 stars 105 forks source link

How to I process the telemetry json messages in Azure data lake? #57

Open deepakkumpala opened 5 years ago

deepakkumpala commented 5 years ago

I have hundred of devices which sending messages to IoT Hub and I am trying to use data lake to process all these messages.

All the articles out there in internet shows uploading CSV files for processing. Is converting to json messages to CSV file is must before getting them processed by data lake engine? can't I process all the incoming json telemetry directly in azure data lake?

pbakhil commented 5 years ago

Converting json to csv can give you advantages while processing. From ADLA perspective, if you have independent rows, then more chances you can parallelize the job. Formats like xml and json are not friendly for big data processing. The size of the data being processed also reduces in subsequent steps if you convert json to csv. Keeping data in json format and processing it was less efficient from our experience, it also depends on your json data structure.