Open jay-hennen opened 1 year ago
Thanks Jay -- a couple comments / questions on this:
I'm not sure I understand what the "time" means in a lock.aquire -- but I'd expect it to be a lot, actually, that's kinda of the point of the lock. e.g. if one thread is waiting for another thread to finish, it's spending a lot of time, even though the other thread(s) could be productivily getting work done.
However:
1) with this much time waiting for locks, we are, at best getting single thread performance, with at least some overhead -- why multithread if you're going to spend that much time waiting to acquire the lock?
2) 8556 calls to acquire a lock? that seems like a massive number! Shouldn't this be on order of hundred or less separate operations??
Hmm -- I wonder if part of it would be that the underlying library (netCDF4?) isn't thread safe. Thus all the locking. But if that's the case then it really should be running it serially, rather than in parallel -- is there a way to tell dask to do that?
Hi all,
A long long time ago I wrote a profiling decorator in the pygnome utilities as a quick and dirty way to get a sense of what's taking so damn long.
I've attached the profiler output from getting a small subset of the TBOFS model
FYI, The profiler is in gnome.utilities.profiledeco and is used as a function decorator. Call print_stats after the function has run to print the stats.
From some initial analysis the first major point of suspicion seems to be the following:
ncalls tottime percall cumtime percall filename:lineno(function)
8556 88.764 0.010 88.764 0.010 {method 'acquire' of '_thread.lock' objects}
What this implies, in short, is that it is explicitly waiting in the
acquire
function for 88.764 seconds (NOT in a sub-call). If I had to guess it looks like the default multi-threading settings of dask or something is punishing us here.To use profiledeco yourself see the snippet below
cprofexample.txt