Closed chaosphere2112 closed 7 years ago
Same issue found when using cdutil.ANNUALCYCLE.departures(). Wondering if there was any progress with this issue...? @chaosphere2112 @doutriaux1
@jypeter it seems you were lost in the original issue migration across repos, so pinging you here for your input
I had kind of forgotten about this issue! I don't have anything new to add to this except maybe that the original thread also mentioned that improving the processing speed would be a nice addition
Thanks to @jypeter for reporting this bug. Let's hope it gets fixed soon.
Adding my 2 cents here regarding speed. I came up with an alternate direct computation of the climatologies using only 4 lines of code and got an even bigger speed-up. I'm seeing a 30x speed-up over cdutil with @jypeter's implementation and 1500x speed-up with my alternate implementation.
Here are the key lines:
# Compute climatologies
climato_values2 = np.zeros([12]+list(np.shape(var))[1:])
month = np.array( [ timax_absolute[i].month for i in range(len(timax_absolute)) ], dtype=np.int)
for i in range(1,13):
climato_values2[i-1] = np.average(var[np.argwhere(month==i)],axis=0)
Attached is the script and below the full output. Just trying to raise the bar for @dnadeau4 and @doutriaux1 :wink:
Variable shape = (3653, 9, 11)
Time axes go from 1959-1-1 12:0:0.0 to 1968-12-31 12:0:0.0
Latitude values = [-66.7283256 -67.84978441 -68.97123954 -70.09269035 -71.21413608
-72.33557575 -73.45700815 -74.57843166 -75.69984422]
Longitude values = [ 120.375 121.5 122.625 123.75 124.875 126. 127.125 128.25
129.375 130.5 131.625]
Thu Mar 2 17:47:15 2017 - Computing the climatology with cdutil.ANNUALCYCLE.climatology,
please wait...
Thu Mar 2 17:47:20 2017 - Done!
time.clock diff = 4.515801
Thu Mar 2 17:47:20 2017 - Computing the climatology by hand...
year, month, nb of selected days = 1959 2 28
year, month, nb of selected days = 1960 2 29
year, month, nb of selected days = 1961 2 28
year, month, nb of selected days = 1962 2 28
year, month, nb of selected days = 1963 2 28
year, month, nb of selected days = 1964 2 29
year, month, nb of selected days = 1965 2 28
year, month, nb of selected days = 1966 2 28
year, month, nb of selected days = 1967 2 28
year, month, nb of selected days = 1968 2 29
Difference range for month 1 = (-4.903731820604662e-06, 6.318861416332311e-06)
Difference range for February (unweighted) = (-5.2588326724389844e-06, 5.322724128120626e-06)
Difference range for February (WEIGHTED) = (-0.014047166386458088, -0.0006071392919806406)
Difference range for month 3 = (-7.192550192769431e-06, 6.8172331566529465e-06)
Difference range for month 4 = (-8.519490563685395e-06, 6.39597574547679e-06)
Difference range for month 5 = (-7.727838379878449e-06, 7.407895935784836e-06)
Difference range for month 6 = (-8.519490563685395e-06, 6.81559244952723e-06)
Difference range for month 7 = (-7.284841260002395e-06, 9.992045690410123e-06)
Difference range for month 8 = (-8.761498278886393e-06, 7.900114923131696e-06)
Difference range for month 9 = (-8.850097671597723e-06, 7.451375310552066e-06)
Difference range for month 10 = (-7.30945221505408e-06, 6.18965391652182e-06)
Difference range for month 11 = (-6.2370300426550784e-06, 6.243387851156967e-06)
Difference range for month 12 = (-6.288097772255696e-06, 6.534207223296562e-06)
Thu Mar 2 17:47:20 2017 - Finished computing the climatology by hand!
time.clock diff = 0.155475
Acceleration = 29.0451905451
Thu Mar 2 17:47:20 2017 - Computing the climatology by hand, second algorithm
Difference range for month 1 = (-6.318861437648593e-06, 4.903731806393807e-06)
Difference range for month 2 = (-2.3858707294266424e-06, 2.938530052176702e-06)
Difference range for month 3 = (-6.8172331779692286e-06, 7.192550164347722e-06)
Difference range for month 4 = (-6.39597574547679e-06, 8.519490556579967e-06)
Difference range for month 5 = (-7.4078959784174e-06, 7.727838337245885e-06)
Difference range for month 6 = (-6.81559244952723e-06, 8.519490556579967e-06)
Difference range for month 7 = (-9.992045747253542e-06, 7.284841231580685e-06)
Difference range for month 8 = (-7.900114979975115e-06, 8.761498236253829e-06)
Difference range for month 9 = (-7.451375324762921e-06, 8.850097657386868e-06)
Difference range for month 10 = (-6.189653952048957e-06, 7.309452179526943e-06)
Difference range for month 11 = (-6.243387858262395e-06, 6.237030028444224e-06)
Difference range for month 12 = (-6.5342072517182714e-06, 6.288097750939414e-06)
Thu Mar 2 17:47:20 2017 - Finished computing the climatology by hand, second algorithm!
time.clock diff = 0.002744
Acceleration = 1645.70007289
@golaz this a totally different issues your script will barf on any general dataset. This is the difference between one off user geenrated script that works for his data only and general application software that works on any dataset. cdutil will work on 6hourly data for example. It will work on data starting in February. It will work on data with missing months, etc...
@doutriaux1, sure, everyone understands that there will be a performance penalty for generality, and generality is what makes cdutil in general and ANNUALCYCLE in particular so useful. Still, a factor 1500x in speed would suggest that there is substantial room for performance improvement without loss of generality.
Any progress with the fix?
@golaz I believe fix is in place. @dnadeau4 ?
I have found out that when using cdutil.ANNUALCYCLE.climatology on daily data, the months of different years seem to be equally weighted.
This results in (slight but visible) errors:
Test script: https://files.lsce.ipsl.fr/public.php?service=files&t=115de1c1cb57217b85776fd08fe21b23 Test data: https://files.lsce.ipsl.fr/public.php?service=files&t=c7258c48f12d3451bfa67e5585bbedef
Running the test script above will get the following output, and you can see that the difference between ANNUALCYCLEclimatology and computation by hand for all months, including unweighted February is ~1e6. But there is a bigger difference when I compute a correctly weighted February mean
Migrated from: https://github.com/UV-CDAT/uvcdat/issues/1664