Closed observingClouds closed 1 year ago
I experimented with numba but it actually did not improve the performance. The refactoring of the function did however increase the performance by about 19000 times. I tested it for a random field of 100x1000 and got 1.05s and 54µs. I guess that's a win 🤣 Numba also does not improve the performance for larger fields (e.g. 10000x1000).
import numpy as np
import cloud metrics
EXAMPLE_MASK=np.random.randint(0,1,(100,1000))
%timeit cloudmetrics.mask.open_sky(EXAMPLE_MASK)
# 54 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Okay, if I use a field with 0s and 1s I only get a factor of 2 🤣
import numpy as np
import cloud metrics
EXAMPLE_MASK=np.random.randint(0,2,(1000,1000))
%timeit cloudmetrics.mask.open_sky(EXAMPLE_MASK)
# 5.45 s ± 82.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note (to myself): Performance between numba and non-numba optimised version can be tested by setting the environment variable NUMBA_DISABLE_JIT
. Testing both versions at the same time without restarting the python interpreter was not successful with the following script:
from numba.tests.support import override_config
with override_config('DISABLE_JIT',True):
import cloudmetrics
%timeit cloudmetrics.mask.open_sky(EXAMPLE_MASK)
sys.modules.pop("cloudmetrics")
del cloudmetrics
@martinjanssens @leifdenby please have a look. If you do not mind that I removed the debug_plot functionality and the NotImplementedError, then this is ready to be merged.
To be honest I'm not 100% sure but I think it is the replacement of some of the np.where
functions.
I came across #68 and optimised the open_sky function for my own purposes and thought I share it here.
What would be left to do to before even considering this PR: