NOAA-ORR-ERD / LibGOODS

Library for accessing data useful for the NOAA / GNOME model
https://libgoods.readthedocs.io/en/latest/
Other
1 stars 2 forks source link

Improving speed #51

Open jay-hennen opened 1 year ago

jay-hennen commented 1 year ago

Hi all,

A long long time ago I wrote a profiling decorator in the pygnome utilities as a quick and dirty way to get a sense of what's taking so damn long.

I've attached the profiler output from getting a small subset of the TBOFS model

FYI, The profiler is in gnome.utilities.profiledeco and is used as a function decorator. Call print_stats after the function has run to print the stats.

From some initial analysis the first major point of suspicion seems to be the following:

ncalls tottime percall cumtime percall filename:lineno(function) 8556 88.764 0.010 88.764 0.010 {method 'acquire' of '_thread.lock' objects}

What this implies, in short, is that it is explicitly waiting in the acquire function for 88.764 seconds (NOT in a sub-call). If I had to guess it looks like the default multi-threading settings of dask or something is punishing us here.

To use profiledeco yourself see the snippet below

import gnome.utilities.profiledeco as prof
@prof.profile
def func():
    doexpensivestuff()

func()
prof.print_stats()

cprofexample.txt

ChrisBarker-NOAA commented 1 year ago

Thanks Jay -- a couple comments / questions on this:

I'm not sure I understand what the "time" means in a lock.aquire -- but I'd expect it to be a lot, actually, that's kinda of the point of the lock. e.g. if one thread is waiting for another thread to finish, it's spending a lot of time, even though the other thread(s) could be productivily getting work done.

However:

1) with this much time waiting for locks, we are, at best getting single thread performance, with at least some overhead -- why multithread if you're going to spend that much time waiting to acquire the lock?

2) 8556 calls to acquire a lock? that seems like a massive number! Shouldn't this be on order of hundred or less separate operations??

Hmm -- I wonder if part of it would be that the underlying library (netCDF4?) isn't thread safe. Thus all the locking. But if that's the case then it really should be running it serially, rather than in parallel -- is there a way to tell dask to do that?