giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
845 stars 173 forks source link

:art: Clean up shape classification tutorial #523

Closed lewtun closed 3 years ago

lewtun commented 3 years ago

Reference issues/PRs

Types of changes

Description

Screenshots (if appropriate)

Any other comments?

Checklist

review-notebook-app[bot] commented 3 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ulupo commented 3 years ago

Wow, that was unexpected! 👍 🥇

wreise commented 3 years ago

It's great, thanks! :heart_eyes:

I believe that data.generate_dataset.generate_point_clouds (now renamed to make_point_clouds) is also used in vietoris_rips_quickstart.ipynb. I am looking at the docs and will suggest some changes anyway.

Meanwhile, I would like to draw our attention to the fact that it has not been picked up by the CI.

EDIT: ah, ok - the notebook tests are disabled, so this was expected - my bad!

ulupo commented 3 years ago

Meanwhile, I would like to draw our attention to the fact that it has not been picked up by the CI.

EDIT: ah, ok - the notebook tests are disabled, so this was expected - my bad!

You are right, but it's good that you are reminding me to trigger notebooks before merging this ;)

lewtun commented 3 years ago

It's greatm thanks! 😍

I believe that data.generate_dataset.generate_point_clouds (now renamed to make_point_clouds) is also used in vietoris_rips_quickstart.ipynb. I am looking at the docs and will suggest some changes anyway.

Meanwhile, I would like to draw our attention to the fact that it has not been picked up by the CI.

EDIT: ah, ok - the notebook tests are disabled, so this was expected - my bad!

Good catch! I did not realise that the quickstart overlaps so strongly with the shape classification example - would it make sense to delete the former in favour of minimising redundancy?

ulupo commented 3 years ago

Would it make sense to delete the former in favour of minimising redundancy?

I would not be in favour of this at the moment. The shape classification tutorial assumes a lot from the reader, mathematically speaking. The quickstart wants to be just that, without any frills, just to establish some basics.

Of course, in an ideal world where Lewises abund, the shape classification tutorial would also be improved and then maybe things could be different.

lewtun commented 3 years ago

I would not be in favour of this at the moment. The shape classification tutorial assumes a lot from the reader, mathematically speaking. The quickstart wants to be just that, without any frills, just to establish some basics.

OK, sounds good to me.

Of course, in an ideal world where Lewises abund, the shape classification tutorial would also be improved and then maybe things could be different.

Do you mean that I would pick a different set of point clouds to warm the reader up on 😃?

ulupo commented 3 years ago

Oh no wait! Sorry @lewtun, everything I have been saying was based on my misunderstanding that you had changed the "2D voids in 2D" notebook, not the shape classification tutorial! So when I said that this tutorials assumes too much from the reader mathematically speaking, after your hard work, that would have come across as quite crass! Sorry!

I take back what I said above and suspend judgement until I read what you actually did here. And ask @wreise what he thinks about your merging proposal.

wreise commented 3 years ago

@lewtun , I modified:

  1. how the images are inserted - raw html does not render well, but inserting them as they are now means that they are not centered in the docs :cry:
  2. the hierarchy of the headings in the notebook (everything is not at ## level). Otherwise, we could add a##-level section for the synthetic data and then keep the ## level for the real dataset. What do you think? Also, i fixed the bug in vietoris_rips_quickstart.

Otherwise, nothing to say and it's great!

lewtun commented 3 years ago

thanks for the changes @wreise - both look good to me! it's a pity about the centering of the images, but i can live with that 😄

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:30Z ----------------------------------------------------------------

I would consider

"The effect of connecting points as we increase some radius is the creation"

instead of

"The effect of connecting points as we increase some radius ϵ results in the creation"

Furthermore, I would be careful with using "geometric simplicial complex" here. Geometric simplicial complexes (as opposed to abstract simplicial complexes) are typically meant to mean actual subsets of Euclidean space in which k-simplices really live as k-dimensional submanifolds (with corners or whatever). In giotto-tda, we never really compute the PH of geometric complexes build from data because we don't yet support alpha filtrations. Vietoris-Rips etc only ever build abstract (i.e. combinatorial) simplicial complexes for data, which can't always be realised as clean triangulations of the actual point cloud for instance.

Furthermore, there seems to be a little alignment/bullet point issue in the definition of the Betti numbers.

Finally, the note about the meaning of homology should terminate with a full stop instead of a comma.


wreise commented on 2020-10-19T20:07:52Z ----------------------------------------------------------------

I agree with @ulupo about the geometric . Also, i realized that the definition of complex says that "a complex is a set of $n$ point, so a line"... We would need to add "the convex hull", and maybe talk about the standard "k-simplex".

To avoid both questions, what if we focused on the abstract simplicial complexes only?