coecms / xmhw

Xarray version of Marine Heatwaves code by Eric Olivier
https://xmhw.readthedocs.io/en/latest/
Apache License 2.0
23 stars 10 forks source link

Help needed to make groovy.reduce work #20

Closed paolap closed 4 years ago

paolap commented 4 years ago

Need some help with xmhw, I'm trying to re-organise the part that calculate all the events characteristic.

At this stage I have a bunch of timeseries arrays with lat/lon stacked in a cell dimension their dimensions are time, cell and they also have an 'event' coordinate which is np.nan where there's no event like nan, nan, 3,3,3,3,3,nan,nan ...

I need to calculate some stats and other quantities for each event, which is why I added the event coordinate so i can groupby that. Every cell is independent so I can groupby cell first

What I did before it worked but was calling groupby too much: branch detect file: detail.py

for each characteristic I had something like

ds['intensity_max'] = array.groupby('cell').map(group_function, args=[np.nanmax], dim='event')

group_function simply does groupby(dim='event').reduce(func) I couldn't call this directly on the groupby('cell') object so I had to use map()

What I'm trying to do now: branch groupby, file features.py

I've created one function mhw_features(ds) that calculates all the event characteristics for 1 event at the time. It takes a dataset which has all the timeseries necessary included, this get removed at the end before the results are returned.

do the first groupby('cell') on the all dataset and map another function that call mhw_features.

ds =ds.groupby('cell').map(call_mhw_features, args=[tdim])

call_mhw_features is similar to group_function

dsgroup.groupby('event').map(mhw_features, args=[tdim])

it groups by 'event' and then call mhw_features()

The difference is that I have to use map() and not reduce() if I use reduce only an array is passed to mhw-features, if I use map() the all dataset is passed. With map(0 I can calculate all the events characteristics but when the groups are aggregated instead of having one value x event I have the same value for the event duration. Plus some of this might also be wrong.

Result example: Dimensions:

lat: 2  lon: 2.  time: 86

Coordinates:

time (time)
doy (time, lat, lon)
event (time, lat, lon)
lat (lat)  
lon  (lon)  

Data variables: ... end_idx (time, lat, lon) start_idx (time, lat, lon)

Result with slower method which is also desired result: Dimensions:

event: 9. lat: 2.  lon: 2.  time: 731

Coordinates:

time (time)
doy (time)
event (event)
lat (lat)
lon (lon)

Data variables: ... end_idx (event, lat, lon) start_idx (event, lat, lon)

NB I still have time and day here because I haven't removed the time series variables yet, I wanted to use them to double check the results.

paolap commented 4 years ago

I probably found a way to do it, not sure why all of a sudden is working I don't even think I changed anything at all ???

paolap commented 4 years ago

This is definitely working, probably I didn't refreshed the cached files and kept on getting an old result!