Closed lewtun closed 4 years ago
@wreise will this notebook be automatically included in the sphinx table of contents when the docs are eventually generated or do i need to manually add it somewhere?
@wreise will this notebook be automatically included in the sphinx table of contents when the docs are eventually generated or do i need to manually add it somewhere?
No, you could add it in doc/notebooks/examples.rst
or doc/notebooks/tutorials.rst
.
Thanks @wreise! I decided to add it to the tutorials since the examples section is best served by a fully fledged case study
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:09Z ----------------------------------------------------------------
Two points:
This time-delay embedding technique is also called Takens’ embedding after Floris Takens, who demonstrated its significance with a celebrated theorem in the context of nonlinear dynamical systems.
Maybe we could communicate this in a similar way here?
TakensEmbedding
transformer, by turning it into:$$ SW_{d,\tau} f : \mathbb{R} \to \mathbb{R}^{d}\,, \qquad t \to \begin{bmatrix} f(t) \\ f(t + \tau) \\ f(t + 2\tau) \\ \vdots \\ (t + (d - 1)\tau) \end{bmatrix} $$
lewtun commented on 2020-08-23T09:02:23Z ----------------------------------------------------------------
Great points! I've also decided to replace the phrase "sliding window" with "time delay" for consistency with the terminology in our TakensEmbedding
transformer
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:10Z ----------------------------------------------------------------
We had a nice GIF we made for https://towardsdatascience.com/detecting-stock-market-crashes-with-topological-data-analysis-7d5dd98abe42 which I think really brings the point home very directly. @lewtun do you think you'd be happy to try to include it? I can provide generating code I think.
There might also be other parts of that blog post we could take inspiration from.
lewtun commented on 2020-08-23T09:04:08Z ----------------------------------------------------------------
Good idea! I will include the GIF in the introductory remarks to give people a taste of what's to come :)
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:11Z ----------------------------------------------------------------
Maybe "ellipticity" would be more correct than "circularity"?
lewtun commented on 2020-08-23T09:04:19Z ----------------------------------------------------------------
Yes, will fix!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:12Z ----------------------------------------------------------------
Should those comments be removed?
lewtun commented on 2020-08-23T09:07:48Z ----------------------------------------------------------------
good catch!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:13Z ----------------------------------------------------------------
Small point but maybe "in a form suitable for the Vietoris-Rips construction" should be "in a form suitable for the VietorisRipsPersistence
transformer" (mathematically, it is the original shape which would seem more natural, not the giotto-tda one).
And maybe we could cross-reference https://github.com/giotto-ai/giotto-tda/blob/add-timeseries-example/examples/vietoris_rips_quickstart.ipynb
?
lewtun commented on 2020-08-23T09:10:13Z ----------------------------------------------------------------
Good idea! Will fix
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:15Z ----------------------------------------------------------------
There are a few occurrences of "persistent diagrams" here and throughout the notebook, but it should be "persistence diagrams".
I'm also concerned about the look of 50% of the plots here, I'll investigate.
lewtun commented on 2020-08-23T10:41:08Z ----------------------------------------------------------------
Good catch! I think the look of the plots is an artifact of reviewnb - in my notebook they are scaled nicely :)
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:15Z ----------------------------------------------------------------
Missing "we" in "What can conclude"
lewtun commented on 2020-08-23T10:41:15Z ----------------------------------------------------------------
Fixed
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:16Z ----------------------------------------------------------------
Very smart and cool!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-17T13:47:17Z ----------------------------------------------------------------
Nooo way! This is a total revelation to me :)
@lewtun I think this is a great sweet notebook and makes for a nice self-contained read.
We might still have to produce another example notebook in which the Takens embedding is used in the context of producing features for time series analysis, in the spirit of e.g. https://towardsdatascience.com/detecting-stock-market-crashes-with-topological-data-analysis-7d5dd98abe42 where the pipeline is: scalar time series -> time series of point clouds (via SlidingWindow
followed by TakensEmbedding
) -> time series of TDA features aligned with the original ts. It might be enough to refactor the Lorenz attractor notebook for the purpose. @wreise is also quite the expert.
Thanks for the feedback @ulupo! I'll integrate it later this week 🚀
Amazing. It executes fast, it's beautiful, self-contained and has great explanations.
The only thing is potentially including classification, as @ulupo mentioned. This would have the downside of making it a bit heavier though.
Great points! I've also decided to replace the phrase "sliding window" with "time delay" for consistency with the terminology in our TakensEmbedding
transformer
View entire conversation on ReviewNB
Good idea! I will include the GIF in the introductory remarks to give people a taste of what's to come :)
View entire conversation on ReviewNB
Good catch! I think the look of the plots is an artifact of reviewnb - in my notebook they are scaled nicely :)
View entire conversation on ReviewNB
@ulupo @wreise I've added the gravitational wave parts for time series classification and included Umbe's suggestions. Once Umbe's fancy new time_delay_embedding
function is merged to master
, I'll integrate into this tutorial!
Is uploading
gravitational-wave-signals.npy
a choice? Would there be a way for this to be generated by the user instead?
These signals are generated from a simulation of two colliding black holes and I don't have the time / motivation to replicate that from scratch (they were provided to me by the author). An alternative would be to place them on some cloud storage or openml
.
FWIW scikit-learn has datasets included in the library (https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets/data) so we would not be violating too many conventions by including the gravitational waves here ...
Thanks @lewtun!
These signals are generated from a simulation of two colliding black holes and I don't have the time / motivation to replicate that from scratch (they were provided to me by the author).
I suspected this much.
An alternative would be to place them on some cloud storage or openml.
We did this (openml
) for the Lorenz attractor notebook. Perhaps we could do it here too for consistency? If not, I agree there's probably no reason to dispair!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-23T13:04:51Z ----------------------------------------------------------------
How do we feel about placing all imports (inc. the ones below) at the top of the notebook?
lewtun commented on 2020-08-28T08:15:26Z ----------------------------------------------------------------
My assumption was that people are likely to read / run the notebook in sequential order, so the learning experience is better when the imports appear as needed (it primes the reader where the modules live). I know we didn't do this in the other notebooks, but would be happy to take care of standardising them if you agree with my reasoning!
ulupo commented on 2020-08-28T08:22:29Z ----------------------------------------------------------------
I agree with your reasoning and it teaches about the library structure more effectively. More consistency would be good across notebooks, but it's not a big problem!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-23T13:04:51Z ----------------------------------------------------------------
This one is nasty, but what do you think about writing "The result is a set of time series of the form [...]" instead?
lewtun commented on 2020-08-28T08:17:19Z ----------------------------------------------------------------
sounds better - will fix!
View / edit / reply to this conversation on ReviewNB
ulupo commented on 2020-08-23T13:04:52Z ----------------------------------------------------------------
How about "we can gain a sense for what the best possible performance is for [...]"? (just to include a verb)
lewtun commented on 2020-08-28T08:18:43Z ----------------------------------------------------------------
good catch!
@lewtun while it helps to have cell outputs to review now, could you remember to flush them all out before finally merging this? Outputs are all shown in the API reference webpage, which is where it really matters.
My assumption was that people are likely to read / run the notebook in sequential order, so the learning experience is better when the imports appear as needed (it primes the reader where the modules live). I know we didn't do this in the other notebooks, but would be happy to take care of standardising them if you agree with my reasoning!
View entire conversation on ReviewNB
I agree with your reasoning and it teaches about the library structure more effectively. More consistency would be good across notebooks, but it's not a big problem!
View entire conversation on ReviewNB
@lewtun while it helps to have cell outputs to review now, could you remember to flush them all out before finally merging this? Outputs are all shown in the API reference webpage, which is where it really matters.
do we do this to save space from all the plotly outputs or is there some other reason to clear the outputs?
@lewtun it's the general idea of avoiding binary content in git
diffs unless it's absolutely necessary (say a png that needs to be displayed and cannot be generated on the fly). The API reference page contains all outputs and they are also more beautifully displayed (I'd rather people end up there as there is so much more content too).
In fact, this makes me realize that we could link from all notebooks to the corresponding rendered versions (we already link from the rendered versions to the ipynb files)!
cool, makes sense and good idea about linking from the notebooks to the docs!
Thanks for the revision @lewtun ! I think you could achieve an even better flush of the outputs by clicking on "Restart and clear all outputs" in jupyter. At the moment I still see "execution counts" which I think would not be there by flushing in this way. As far as I am concerned, we can merge this once #460 is merged.
@lewtun I just realised that pandas
is needed for this notebook. But pandas
is not a requirement of the library so this should be at least highlighted. Are you using it just to have pretty/easy plots?
@lewtun I just realised that
pandas
is needed for this notebook. Butpandas
is not a requirement of the library so this should be at least highlighted. Are you using it just to have pretty/easy plots?
good catch! i've replaced pandas with plotly now.
@lewtun following the mega-merge of #460, this notebook can now be fixed with the new TakensEmbedding
and the new takens_embedding_optimal_parameters
function (there is also SingleTakensEmbedding
but I don't know if it's a good idea to suggest it should be used in a pipeline).
Cool stuff! Will update this with your other suggestions on Friday!
@ulupo I've pushed a couple of small changes:
TakensEmbedding
-> SingleTakensEmbedding
I looked into putting the dataset on openML and it's a real nightmare because openML only supports the esoteric ARFF format which is designed for tabular data, not collections of time series (as far as I can tell). I'm not sure the overhead is worth it right now 😬
Also, because I need to run PCA on the output of the Takens embedding, I ended up not using the TakensEmbedding
transformer after all because it produces 3D output from the collection and PCA wants 2D 😢.
Let me know what you think, but otherwise this PR can be merged IMO
@wreise OK to merge the current state, but I can't stress how unhappy I am that we can't reproduce this paper due to the need to apply PCA to collections. This is why I am proposing #490 and I now think something like this could be a very important addition to the library. Ideally, we add this functionality and then revisit this notebook accordingly. @lewtun does this sound reasonable to you?
@wreise OK to merge the current state, but I can't stress how unhappy I am that we can't reproduce this paper due to the need to apply PCA to collections. This is why I am proposing #490 and I now think something like this could be a very important addition to the library. Ideally, we add this functionality and then revisit this notebook accordingly. @lewtun does this sound reasonable to you?
This sounds perfectly fine with me. One thing to mention is that we can't really reproduce the paper anyway because they use TDA features + CNNs to get their results 😄.
We currently have a snake_case naming convention inside
images
, do you think we should changetime-delay-embedding.gif
, or have all the other ones use dashes instead?
snake_case is good! i fixed the filename of the gif, so this should be good to go!
One thing to mention is that we can't really reproduce the paper anyway because they use TDA features + CNNs to get their results 😄.
Haha, giving the bad news one at a time. But I think it was a good excuse to open #495.
Reference issues/PRs
Types of changes
Description
This PR adds a Jupyter notebook example that showcases how giotto-tda can be used for time series analysis. It focuses on two main ideas:
I decided against showcasing the resampling tricks, since I felt this would distract from the core aim of "getting started".
Screenshots (if appropriate)
Any other comments? I have not yet verified whether the resulting docs from this notebook look good or not. Will do it once I sync with @wreise
Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.