:art: Clean up shape classification tutorial

Reference issues/PRs

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Description

Cleans up the shape classification tutorial to a) make it more pedagogical and b) more focused.
Some of the PD calculations can take O(minutes) which may be a concern if we eventually build the docs with every push.

Screenshots (if appropriate)

Any other comments?

Checklist

[x] I have read the guidelines for contributing.
[x] My code follows the code style of this project. I used flake8 to check my Python changes.
[x] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[ ] I have added tests to cover my changes.
[ ] All new and existing tests passed. I used pytest to check this on Python tests.

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:30Z ----------------------------------------------------------------

I would consider

"The effect of connecting points as we increase some radius is the creation"

instead of

"The effect of connecting points as we increase some radius ϵ results in the creation"

Furthermore, I would be careful with using "geometric simplicial complex" here. Geometric simplicial complexes (as opposed to abstract simplicial complexes) are typically meant to mean actual subsets of Euclidean space in which k-simplices really live as k-dimensional submanifolds (with corners or whatever). In giotto-tda, we never really compute the PH of geometric complexes build from data because we don't yet support alpha filtrations. Vietoris-Rips etc only ever build abstract (i.e. combinatorial) simplicial complexes for data, which can't always be realised as clean triangulations of the actual point cloud for instance.

Furthermore, there seems to be a little alignment/bullet point issue in the definition of the Betti numbers.

Finally, the note about the meaning of homology should terminate with a full stop instead of a comma.

wreise commented on 2020-10-19T20:07:52Z ----------------------------------------------------------------

I agree with @ulupo about the geometric . Also, i realized that the definition of complex says that "a complex is a set of $n$ point, so a line"... We would need to add "the convex hull", and maybe talk about the standard "k-simplex".

To avoid both questions, what if we focused on the abstract simplicial complexes only?

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:31Z ----------------------------------------------------------------

"From the persistence diagrams [...]" appears as a section title, which would seem weird to me. Furthermore, there might be a small issue with bullet points (at least there is one in ReviewNB).

wreise commented on 2020-10-19T19:50:21Z ----------------------------------------------------------------

Having it open in Jupyter notebook, I do not see any issue with the section title nor the bullet points.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:32Z ----------------------------------------------------------------

The link to the persistence entropy doc entry seems mangled here.

wreise commented on 2020-10-19T19:58:33Z ----------------------------------------------------------------

Clicking from the notebook or the docs, it works like a charm - it links to the glossary. https://giotto-ai.github.io/gtda-docs/latest/theory/glossary.html#persistence-entropy

Did you experience any particular issue?

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:33Z ----------------------------------------------------------------

Again, "A more sophisticated feature is [...]" appears as a section title on ReviewNB, could you check?

wreise commented on 2020-10-19T19:58:59Z ----------------------------------------------------------------

All good :)

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

ulupo commented on 2020-10-19T08:55:33Z ----------------------------------------------------------------

It really disturbs me that Python gives precedence to "+" over the star operator. To me this makes the code less readable. Would this not work?

feature_union = make_union(
   PersistenceEntropy(normalize=True),
    NumberOfPoints(n_jobs=-1)],
    *[Amplitude(**metric, n_jobs=-1) for metric in metrics]
)

_wreise commented on 2020-10-19T20:00:18Z_ ----------------------------------------------------------------

I agree with that.

ulupo commented 3 years ago

@lewtun this is amazing work, excellent! The Klein bottle (and projective space) are dead, long live the Klein bottle (and projective space).

Just left a couple of pedantic comments. Having read quite how good a pedagogical job you did, I would now not be in disfavour of merging this notebook with the quickstart. Somehow it would be nice to keep some of the immediacy of the quickstart, but maybe this is hard. In any case, this can be done in a separate PR to keep things easier to follow.

Once you or @wreise address my review I will run the CI with the notebook checks on.

wreise commented 3 years ago

Having it open in Jupyter notebook, I do not see any issue with the section title nor the bullet points.

View entire conversation on ReviewNB

wreise commented 3 years ago

Having it open in Jupyter notebook, I do not see any issue with the section title nor the bullet points.

View entire conversation on ReviewNB

wreise commented 3 years ago

Clicking from the notebook or the docs, it works like a charm - it links to the glossary. https://giotto-ai.github.io/gtda-docs/latest/theory/glossary.html#persistence-entropy

Did you experience any particular issue?

View entire conversation on ReviewNB

wreise commented 3 years ago

All good :)

View entire conversation on ReviewNB

wreise commented 3 years ago

I agree with that.

View entire conversation on ReviewNB

wreise commented 3 years ago

To avoid both questions, what if we focused on the abstract simplicial complexes only?

View entire conversation on ReviewNB

lewtun commented 3 years ago

hey @ulupo and @wreise, i tweaked the text to remove any reference to "geometric simplicial complex", but decided to keep the pictures and description of $k$-simplices because i personally found this useful to understand intuitively what's going on under the hood.

i think wojciech already tackled the other remarks, so please go ahead with the CI test and if everything checks out feel free to clear the outputs and merge!

ulupo commented 3 years ago

@lewtun thanks! I'll look into the CI. From @wreise I'd like to know whether things would look good online in the current state. I'm particularly worried about the occasional use of HTML code.

ulupo commented 3 years ago

The CI failures in macOS are due to an ongoing screw-up in the way Azure pipelines and brew interact. Hoping they fix it soon...

wreise commented 3 years ago

@ulupo , the html looks good. The only trouble is with the tip - i haven't found a reasonable fix yet.

lewtun commented 3 years ago

@ulupo , the html looks good. The only trouble is with the tip - i haven't found a reasonable fix yet.

thanks for checking @wreise! maybe we can just ignore the ">" syntax i used for the tip and just resort to normal text?

wreise commented 3 years ago

thanks for checking @wreise! maybe we can just ignore the ">" syntax i used for the tip and just resort to normal text?

Yes, this works. @ulupo , imo, it's ready to be merge.

ulupo commented 3 years ago

Unfortunately the brew issue is quite severe and Azure are not being too proactive at fixing it, so we will have to merge this knowing some pipelines will fail. Additionally, the CI has become too heavy for Azure it seems, we will have to make more of the checks optional from now on.

lewtun commented 3 years ago

Unfortunately the brew issue is quite severe and Azure are not being too proactive at fixing it, so we will have to merge this knowing some pipelines will fail. Additionally, the CI has become too heavy for Azure it seems, we will have to make more of the checks optional from now on.

Pardon my ignorance, but is there any reason we couldn't migrate all our CI to GitHub actions? Or is it because we support mac OS / Windows / Linux that we're forced to use Azure?

ulupo commented 3 years ago

Pardon my ignorance, but is there any reason we couldn't migrate all our CI to GitHub actions? Or is it because we support mac OS / Windows / Linux that we're forced to use Azure?

I don't know of any reason for using Azure especially, and I am fundamentally ignorant about modern CI practices. One thing we like to use is the manylinux2010 platform (? I don't really understand what it means) for building linux wheels. But I also don't know why we should be migrating to a new system. Perhaps this brew issue is a good reason by itself if not resolved quickly, but otherwise, why bother with the presumably non-trivial job of migrating? What are the core benefits?

ulupo commented 3 years ago

Also relevant and includes support for Python 3.9: https://cibuildwheel.readthedocs.io/en/stable/

ulupo commented 3 years ago

Perhaps GitHub actions would make the job of deploying docs easier? @wreise in that case there would be yet another reason to migrate.

ulupo commented 3 years ago

I'd be happy to look into migrating as a team effort. On my own, I lack the time/motivation at the moment.

lewtun commented 3 years ago

But I also don't know why we should be migrating to a new system.

Oh I only suggested this because of the problem with brew on Azure. But you're right, we should do some research to figure out what the alternatives are and decide whether it even makes sense to migrate. I am pretty busy right now, but would be happy to look into this in a few months time.

giotto-ai / giotto-tda

:art: Clean up shape classification tutorial #523