Azure / azure-data-lake-store-net

Azure Data Lake Store .Net SDK
MIT License
18 stars 24 forks source link

async directory content summary #20

Closed igor-betin closed 6 years ago

igor-betin commented 6 years ago

Is there a way to call it in async way without blocking the thread? I see async calls for open, create, read, etc. but no async for list of contents

rahuldutta90 commented 6 years ago

No there isnt a method in the SDK. Getcontentsummary itself is a multi threaded operation that calls liststatus. You can create a async task by doing following: Task transferTask = Task.Run(() => { client.GetContentSummary()},cancellationtoken);

igor-betin commented 6 years ago

But it defeats purpose of using async. Why don't you make nested calls async as well?

Also, using REST API content summary can be aquired using single call with ?op=GETCONTENTSUMMARY, why is there need for nested calls?

rahuldutta90 commented 6 years ago

Async is a language construct. The tasks that you get from aync/await are actually immplemented by internal threadpool. How does that matter to a user who is calling a GetContentSummary if internally we have async calls or using explicit thread pool to do the operations. And if your intention is not waiting for getcontentsummary, then you can create a task and poll the status as shown above.

In ADLS we have millions of files/directories. To enumerate all of them it will take atleast couple minuites. It is not feasible to have a web request call with a time out of 10 minutes. So we have to do this as nested calls on the client side.

igor-betin commented 6 years ago

Async it language construct that allows threads to be reused and instead of polling actually invoke callbacks only when I/O is finished. If async calls are used from ground up, i.e. from HTTP call in this case, then application profits greatly from it. If there is a thread manager under the hood - it's bad practice.

Please look here for more details. https://msdn.microsoft.com/en-us/magazine/jj991977.aspx

igor-betin commented 6 years ago

Also I see here https://docs.microsoft.com/en-us/rest/api/datalakestore/webhdfs-filesystem-apis operation called GETCONTENTSUMMARY Can't you call something like that?

rahuldutta90 commented 6 years ago

As I just described: In ADLS we have millions of files/directories. To enumerate all of them it will take atleast couple minutes. It is not feasible to have a web request call with a time out of 10 minutes. So we have to do this as nested calls on the client side.