SECOORA / skill_score

Prototypes for the SECOORA skill score
MIT License
7 stars 1 forks source link

Substitute `nearxy ` and `find_ij` for Scipy's KDTree. #8

Closed ocefpaf closed 10 years ago

ocefpaf commented 10 years ago

Substitute nearxy and find_ij for Scipy's KDTree and test if this approach is faster and more general to both structured and unstructured models.

https://github.com/ioos/secoora/blob/master/notebooks/inundation/inundation_secoora.ipynb#L1188

KDTree will allow to search all points of lon, lat that are near the observations. With the computed tree it is faster to find several points, but it can be slower to compute just one or a few.

ocefpaf commented 10 years ago

@rsignell-usgs:

I decided to create a few generic functions using Scipy's KDTree to find the nearest data to the stations position. The idea is to have something that is fast and easy to re-use. The tree helps with that. I have been testing this idea and trying to improve it. Here is a draft of what I have so far:

http://nbviewer.ipython.org/github/ioos/secoora/blob/master/notebooks/inundation/inundation_secoora.ipynb

The ultimate goal would be to have a get_model(at_station) function. It would take a station series and return a model series at the same place and time. This station series would carry time and space metadata with it.

Maybe I am investing too much time into this. But I think that we will be doing this a lot like in the glide-model comparison, so maybe it is worth it... What do you think?

rsignell-usgs commented 10 years ago

@kwilcox, didn't you implement something like this for your paegan work?

jcothran commented 10 years ago

For the python notebook example, cell 13 in, there might be a misspelling of 'NADV' for 'NAVD'

if row['datum'] == 'NADV':

ocefpaf commented 10 years ago

Yep. thanks @jcothran that will be fixed in the next push.

rsignell-usgs commented 10 years ago

@ocefpaf , when you say "This station series would carry time and space metadata with it." it starts to sound like a common data model object. And that makes me wonder if you could use an existing common data model object instead. I just had a gchat with @kwilcox and he said paegan is too immature to look at. Can you take a look at the Iris data model and see if it would work?

Check out cell [7] in http://nbviewer.ipython.org/gist/rsignell-usgs/d48242d13d17f9360d49 to see what an Iris time series object looks like. -Rich

ocefpaf commented 10 years ago

@rsignell-usgs That is exactly the idea. I am not re-inventing the wheel. Iris time-series are pandas time-series with more metadata and that is what this get_model(at_station) would take as input.

I am just developing this slowly to avoid messing up.

ocefpaf commented 10 years ago

The prototype is ready. I am closing this issue so I can open a new ones addressing what we discussed here and what I found with this notebook.

Here is the current view: http://nbviewer.ipython.org/github/ioos/secoora/blob/master/notebooks/inundation/inundation_secoora.ipynb