Closed paolap closed 4 years ago
I probably found a way to do it, not sure why all of a sudden is working I don't even think I changed anything at all ???
This is definitely working, probably I didn't refreshed the cached files and kept on getting an old result!
Need some help with xmhw, I'm trying to re-organise the part that calculate all the events characteristic.
At this stage I have a bunch of timeseries arrays with lat/lon stacked in a cell dimension their dimensions are time, cell and they also have an 'event' coordinate which is np.nan where there's no event like nan, nan, 3,3,3,3,3,nan,nan ...
I need to calculate some stats and other quantities for each event, which is why I added the event coordinate so i can groupby that. Every cell is independent so I can groupby cell first
What I did before it worked but was calling groupby too much: branch detect file: detail.py
for each characteristic I had something like
ds['intensity_max'] = array.groupby('cell').map(group_function, args=[np.nanmax], dim='event')
group_function simply does groupby(dim='event').reduce(func) I couldn't call this directly on the groupby('cell') object so I had to use map()
What I'm trying to do now: branch groupby, file features.py
I've created one function mhw_features(ds) that calculates all the event characteristics for 1 event at the time. It takes a dataset which has all the timeseries necessary included, this get removed at the end before the results are returned.
do the first groupby('cell') on the all dataset and map another function that call mhw_features.
ds =ds.groupby('cell').map(call_mhw_features, args=[tdim])
call_mhw_features is similar to group_function
dsgroup.groupby('event').map(mhw_features, args=[tdim])
it groups by 'event' and then call mhw_features()
The difference is that I have to use map() and not reduce() if I use reduce only an array is passed to mhw-features, if I use map() the all dataset is passed. With map(0 I can calculate all the events characteristics but when the groups are aggregated instead of having one value x event I have the same value for the event duration. Plus some of this might also be wrong.
Result example: Dimensions:
Coordinates:
Data variables: ... end_idx (time, lat, lon) start_idx (time, lat, lon)
Result with slower method which is also desired result: Dimensions:
Coordinates:
Data variables: ... end_idx (event, lat, lon) start_idx (event, lat, lon)
NB I still have time and day here because I haven't removed the time series variables yet, I wanted to use them to double check the results.