Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

CPU overhead for submitting tasks #326

Open simon-tarr opened 6 years ago

simon-tarr commented 6 years ago

Hello,

Not an issue but more a question;

What's the expected (local) CPU overhead for submitting tasks when the data being uploaded to batch is minimal (say, a dataframe <5MB)? I have three concurrent R sessions running jobs on Azure and it can quite comfortably suck all my clock cycles if all three sessions happen to be submitting tasks at the same time.

I'm running Pop_OS! (Ubuntu 18.10) on a Dell XPS 15 9550. This laptop is a few years old now but it's no slouch by any means...I'd have expected substantially less resource overhead given that the processing is occuring in the cloud.

I notice a similarly large increase in CPU usage on my Win10 workstation at uni but, as it's a far more powerful machine, it's always usable if many tasks are being submitted concurrently across R sessions.

Perhaps this ties into issue #300 and #217?

Cheers, SImon

brnleehng commented 6 years ago

Hi Simon,

I'll do some benchmarking on CPU overhead on task submission. How many tasks is each R session submitting?

Thanks, Brian

simon-tarr commented 6 years ago

Thanks Brian.

I'm submitting 511 tasks within each job (to a pool of 512 cores).

EDIT - I've noticed that the problem is excercerbated if one (or more) pools is downloading a merged result. When returning a result, this will also cause high CPU load and really slow down the time it takes to then submit tasks in other sessions.

I'm on a gigabit internet connection at my university so I don't think it's a bandwidth issue causing slow uploads/downloads. The .csv returned from each run is only 8.4MB in size so I'd expect it to download pretty rapidly over both my work and home connection.