Create a workflow to find geolocations related to the reference data

Climate-Data-Science / Climate-Similarity-Metrics

Which similarity metrics are the most helpful to understand climate

0 stars 2 forks source link

Create a workflow to find geolocations related to the reference data #3

Closed pawelbielski closed 4 years ago

pawelbielski commented 4 years ago

Given a reference time series (firstly a chosen geopoint, later the QBO index), compute the point-wise similarities (correlation, or Mutual Information) on the map. The goal is that we start with one similarity function, for a fixed point.

Steps:

[x] Write a code to compute similarity between one chosen geopoint and all other points on a given level. Plot the results on the map (you can look at Nature paper for inspiration).
[x] Move the logic to the separate file calculations.py. Present the results in jupyter notebook.
[x] Once the QBO is known, create a map between QBO and all other points on the level.

pawelbielski commented 4 years ago

Comments to 2_point-wise_similarities.ipynb:

For the plot of similarity to point, could you also mark the reference point on the map? In this way it will be interpretable.

pawelbielski commented 4 years ago

@pierretoussing The code you wrote is well structured and easily extendable. However, I think it might be possible to vectorize calculate_series_similarity(). Notice that sim_fun() is called 256 * 512 times, and all the runs are independent from each other.

[x] Investigate if it is possible to vectorize calculate_series_similarity(). If yes: vectorize it. If no: explain why not.

pierretoussing commented 4 years ago

I made some research and I do not think that it is possible to vectorize it. I now use the np.apply_along_axis() which results in a minor speedup, but in order to avoid 256 * 512 calls of the sim_fun() I would have to vectorize the implementation of all the similarity measures which are imported from external libraries.

pawelbielski commented 4 years ago

Thanks for the explanation. What is the speedup you achieve with np.apply_along_axis()?

I ve created a separate issue #11 for this. For any further comments on speedup please answer there.

pierretoussing commented 4 years ago

I measured both executions with %%timeit and it returned a difference of 2 seconds, which are 7% in this context.