Add notebook for topological time series example

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:09Z ----------------------------------------------------------------

Two points:

Are we sure that the sliding window embedding technique was invented by Takens? We are of course naming it after him because of his theorem. In https://towardsdatascience.com/detecting-stock-market-crashes-with-topological-data-analysis-7d5dd98abe42, we e.g. wrote:

This time-delay embedding technique is also called Takens’ embedding after Floris Takens, who demonstrated its significance with a celebrated theorem in the context of nonlinear dynamical systems.

Maybe we could communicate this in a similar way here?

I think the definition of $SW_{M, \tau}$ could be made a little closer to the API and docstrings of our TakensEmbedding transformer, by turning it into:

$$ SW_{d,\tau} f : \mathbb{R} \to \mathbb{R}^{d}\,, \qquad t \to \begin{bmatrix} f(t) \\ f(t + \tau) \\ f(t + 2\tau) \\ \vdots \\ (t + (d - 1)\tau) \end{bmatrix} $$

lewtun commented on 2020-08-23T09:02:23Z ----------------------------------------------------------------

Great points! I've also decided to replace the phrase "sliding window" with "time delay" for consistency with the terminology in our TakensEmbedding transformer

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:10Z ----------------------------------------------------------------

We had a nice GIF we made for https://towardsdatascience.com/detecting-stock-market-crashes-with-topological-data-analysis-7d5dd98abe42 which I think really brings the point home very directly. @lewtun do you think you'd be happy to try to include it? I can provide generating code I think.

There might also be other parts of that blog post we could take inspiration from.

lewtun commented on 2020-08-23T09:04:08Z ----------------------------------------------------------------

Good idea! I will include the GIF in the introductory remarks to give people a taste of what's to come :)

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:11Z ----------------------------------------------------------------

Maybe "ellipticity" would be more correct than "circularity"?

lewtun commented on 2020-08-23T09:04:19Z ----------------------------------------------------------------

Yes, will fix!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:12Z ----------------------------------------------------------------

Should those comments be removed?

lewtun commented on 2020-08-23T09:07:48Z ----------------------------------------------------------------

good catch!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:13Z ----------------------------------------------------------------

Small point but maybe "in a form suitable for the Vietoris-Rips construction" should be "in a form suitable for the VietorisRipsPersistence transformer" (mathematically, it is the original shape which would seem more natural, not the giotto-tda one).

And maybe we could cross-reference https://github.com/giotto-ai/giotto-tda/blob/add-timeseries-example/examples/vietoris_rips_quickstart.ipynb?

lewtun commented on 2020-08-23T09:10:13Z ----------------------------------------------------------------

Good idea! Will fix

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:15Z ----------------------------------------------------------------

There are a few occurrences of "persistent diagrams" here and throughout the notebook, but it should be "persistence diagrams".

I'm also concerned about the look of 50% of the plots here, I'll investigate.

lewtun commented on 2020-08-23T10:41:08Z ----------------------------------------------------------------

Good catch! I think the look of the plots is an artifact of reviewnb - in my notebook they are scaled nicely :)

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:15Z ----------------------------------------------------------------

Missing "we" in "What can conclude"

lewtun commented on 2020-08-23T10:41:15Z ----------------------------------------------------------------

Fixed

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:16Z ----------------------------------------------------------------

Very smart and cool!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-17T13:47:17Z ----------------------------------------------------------------

Nooo way! This is a total revelation to me :)

ulupo commented 4 years ago

@lewtun I think this is a great sweet notebook and makes for a nice self-contained read.

We might still have to produce another example notebook in which the Takens embedding is used in the context of producing features for time series analysis, in the spirit of e.g. https://towardsdatascience.com/detecting-stock-market-crashes-with-topological-data-analysis-7d5dd98abe42 where the pipeline is: scalar time series -> time series of point clouds (via SlidingWindow followed by TakensEmbedding) -> time series of TDA features aligned with the original ts. It might be enough to refactor the Lorenz attractor notebook for the purpose. @wreise is also quite the expert.

lewtun commented 4 years ago

Thanks for the feedback @ulupo! I'll integrate it later this week 🚀

wreise commented 4 years ago

Amazing. It executes fast, it's beautiful, self-contained and has great explanations.

The only thing is potentially including classification, as @ulupo mentioned. This would have the downside of making it a bit heavier though.

lewtun commented 4 years ago

Great points! I've also decided to replace the phrase "sliding window" with "time delay" for consistency with the terminology in our TakensEmbedding transformer

View entire conversation on ReviewNB

lewtun commented 4 years ago

Good idea! I will include the GIF in the introductory remarks to give people a taste of what's to come :)

View entire conversation on ReviewNB

lewtun commented 4 years ago

Yes, will fix!

View entire conversation on ReviewNB

lewtun commented 4 years ago

good catch!

View entire conversation on ReviewNB

lewtun commented 4 years ago

Good idea! Will fix

View entire conversation on ReviewNB

lewtun commented 4 years ago

Good catch! I think the look of the plots is an artifact of reviewnb - in my notebook they are scaled nicely :)

View entire conversation on ReviewNB

lewtun commented 4 years ago

Fixed

View entire conversation on ReviewNB

lewtun commented 4 years ago

@ulupo @wreise I've added the gravitational wave parts for time series classification and included Umbe's suggestions. Once Umbe's fancy new time_delay_embedding function is merged to master, I'll integrate into this tutorial!

lewtun commented 4 years ago

Is uploading gravitational-wave-signals.npy a choice? Would there be a way for this to be generated by the user instead?

These signals are generated from a simulation of two colliding black holes and I don't have the time / motivation to replicate that from scratch (they were provided to me by the author). An alternative would be to place them on some cloud storage or openml.

FWIW scikit-learn has datasets included in the library (https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets/data) so we would not be violating too many conventions by including the gravitational waves here ...

ulupo commented 4 years ago

Thanks @lewtun!

These signals are generated from a simulation of two colliding black holes and I don't have the time / motivation to replicate that from scratch (they were provided to me by the author).

I suspected this much.

An alternative would be to place them on some cloud storage or openml.

We did this (openml) for the Lorenz attractor notebook. Perhaps we could do it here too for consistency? If not, I agree there's probably no reason to dispair!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-23T13:04:51Z ----------------------------------------------------------------

How do we feel about placing all imports (inc. the ones below) at the top of the notebook?

lewtun commented on 2020-08-28T08:15:26Z ----------------------------------------------------------------

My assumption was that people are likely to read / run the notebook in sequential order, so the learning experience is better when the imports appear as needed (it primes the reader where the modules live). I know we didn't do this in the other notebooks, but would be happy to take care of standardising them if you agree with my reasoning!

ulupo commented on 2020-08-28T08:22:29Z ----------------------------------------------------------------

I agree with your reasoning and it teaches about the library structure more effectively. More consistency would be good across notebooks, but it's not a big problem!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-23T13:04:51Z ----------------------------------------------------------------

This one is nasty, but what do you think about writing "The result is a set of time series of the form [...]" instead?

lewtun commented on 2020-08-28T08:17:19Z ----------------------------------------------------------------

sounds better - will fix!

review-notebook-app[bot] commented 4 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-08-23T13:04:52Z ----------------------------------------------------------------

How about "we can gain a sense for what the best possible performance is for [...]"? (just to include a verb)

lewtun commented on 2020-08-28T08:18:43Z ----------------------------------------------------------------

good catch!

ulupo commented 4 years ago

@lewtun while it helps to have cell outputs to review now, could you remember to flush them all out before finally merging this? Outputs are all shown in the API reference webpage, which is where it really matters.

lewtun commented 4 years ago

View entire conversation on ReviewNB

lewtun commented 4 years ago

sounds better - will fix!

View entire conversation on ReviewNB

lewtun commented 4 years ago

good catch!

View entire conversation on ReviewNB

ulupo commented 4 years ago

I agree with your reasoning and it teaches about the library structure more effectively. More consistency would be good across notebooks, but it's not a big problem!

View entire conversation on ReviewNB

lewtun commented 4 years ago

@lewtun while it helps to have cell outputs to review now, could you remember to flush them all out before finally merging this? Outputs are all shown in the API reference webpage, which is where it really matters.

do we do this to save space from all the plotly outputs or is there some other reason to clear the outputs?

ulupo commented 4 years ago

@lewtun it's the general idea of avoiding binary content in git diffs unless it's absolutely necessary (say a png that needs to be displayed and cannot be generated on the fly). The API reference page contains all outputs and they are also more beautifully displayed (I'd rather people end up there as there is so much more content too).

ulupo commented 4 years ago

In fact, this makes me realize that we could link from all notebooks to the corresponding rendered versions (we already link from the rendered versions to the ipynb files)!

lewtun commented 4 years ago

cool, makes sense and good idea about linking from the notebooks to the docs!

ulupo commented 4 years ago

Thanks for the revision @lewtun ! I think you could achieve an even better flush of the outputs by clicking on "Restart and clear all outputs" in jupyter. At the moment I still see "execution counts" which I think would not be there by flushing in this way. As far as I am concerned, we can merge this once #460 is merged.

ulupo commented 4 years ago

@lewtun I just realised that pandas is needed for this notebook. But pandas is not a requirement of the library so this should be at least highlighted. Are you using it just to have pretty/easy plots?

lewtun commented 4 years ago

@lewtun I just realised that pandas is needed for this notebook. But pandas is not a requirement of the library so this should be at least highlighted. Are you using it just to have pretty/easy plots?

good catch! i've replaced pandas with plotly now.

ulupo commented 4 years ago

@lewtun following the mega-merge of #460, this notebook can now be fixed with the new TakensEmbedding and the new takens_embedding_optimal_parameters function (there is also SingleTakensEmbedding but I don't know if it's a good idea to suggest it should be used in a pipeline).

lewtun commented 4 years ago

Cool stuff! Will update this with your other suggestions on Friday!

lewtun commented 4 years ago

@ulupo I've pushed a couple of small changes:

TakensEmbedding -> SingleTakensEmbedding
Moved the GIF
Cleared the outputs

I looked into putting the dataset on openML and it's a real nightmare because openML only supports the esoteric ARFF format which is designed for tabular data, not collections of time series (as far as I can tell). I'm not sure the overhead is worth it right now 😬

Also, because I need to run PCA on the output of the Takens embedding, I ended up not using the TakensEmbedding transformer after all because it produces 3D output from the collection and PCA wants 2D 😢.

Let me know what you think, but otherwise this PR can be merged IMO

ulupo commented 4 years ago

@wreise OK to merge the current state, but I can't stress how unhappy I am that we can't reproduce this paper due to the need to apply PCA to collections. This is why I am proposing #490 and I now think something like this could be a very important addition to the library. Ideally, we add this functionality and then revisit this notebook accordingly. @lewtun does this sound reasonable to you?

lewtun commented 4 years ago

@wreise OK to merge the current state, but I can't stress how unhappy I am that we can't reproduce this paper due to the need to apply PCA to collections. This is why I am proposing #490 and I now think something like this could be a very important addition to the library. Ideally, we add this functionality and then revisit this notebook accordingly. @lewtun does this sound reasonable to you?

This sounds perfectly fine with me. One thing to mention is that we can't really reproduce the paper anyway because they use TDA features + CNNs to get their results 😄.

lewtun commented 4 years ago

We currently have a snake_case naming convention inside images, do you think we should change time-delay-embedding.gif, or have all the other ones use dashes instead?

snake_case is good! i fixed the filename of the gif, so this should be good to go!

ulupo commented 4 years ago

One thing to mention is that we can't really reproduce the paper anyway because they use TDA features + CNNs to get their results 😄.

Haha, giving the bad news one at a time. But I think it was a good excuse to open #495.

giotto-ai / giotto-tda

Add notebook for topological time series example #458