Drop relabel parameter from SimilaritySurface methods

saraedum commented 2 years ago

I found that the relabel parameter is not implemented consistently in SimilaritySurface, namely:

copy honors it.
reposition_polygons honors it when in_place is False.
triangulate ignores it.
delaunay_triangulation only honors it when the surface is infinite.
delaunay_decomposition honors it when in_place is False.

I understand why in_place=True and relabel=True do not make much sense. But anyway, there are at least two bugs in the current implementation.

So my question is: do we really need this parameter in all but copy?

wphooper commented 2 years ago

Can you say where the bugs are?

I agree the current system is ad hoc, and it would be better for this to be handled in one place (like copy) to lessen the chance of errors.

saraedum commented 2 years ago

Sorry, if what I wrote was not clear. The bug is that there is a parameter and it is ignored, e.g.,

    def triangulate(self, in_place=False, label = None, relabel=False):
        r"""
        ...
        This can be a relabeled copy (indexed by the non-negative ints)
        or a label preserving copy. The copy is relabeled if relabel=True
        (default False).
        ...
        """

But relabel is never used in the implementation. So what the docstring claims cannot be true.

wphooper commented 2 years ago

Thanks for pointing this out. I agree that this is an error in the code for triangulate. I see that reposition_polygons just calls copy in the code and passes the relabel option on.

It has been a while since I thought about this. I guess your point is why not just do: ss.reposition_polygons().copy(relabel=True) instead of ss.reposition_polygons(relabel=True)?

I can think of a few reasons you might want to pass parameters to copy. The main one is that for an infinite surface, the copy is by reference, so the .reposition().copy(relabel=True) approach leads to the creation of two surfaces which persist in memory. Eventually I hope to add better support for mappings between related surfaces, and then you also get unnecessary intermediate objects. Still, this should be a minor cost, I think.

Another comment is that if relabel=False then copy uses the Surface_dict implementation. A bad thing about this is that when new labels are created, ExtraLabel is used (because in particular in a infinite surface surface being modified with a reference stored) it is hard to know if a given possible label is already in a surface or not. I tend to want to avoid ExtraLabel if possible, and I don't think it works yet with pickling (for instance). In principle, supporting relabel=True allows us to avoid this sort of issue. (And if .reposition_polygons().copy(relabel=True) is used, then ExtraLabels might persist in some intermediate object.)

I agree though that this makes code writing and maintenance an issue, as your comments clearly point out. So, I tend to agree we should simplify the code. Maybe we should also fix ExtraLabel?

saraedum commented 2 years ago

It has been a while since I thought about this. I guess your point is why not just do: ss.reposition_polygons().copy(relabel=True) instead of ss.reposition_polygons(relabel=True)?

Yes. Making this modular makes the code much easier to maintain I think.

I can think of a few reasons you might want to pass parameters to copy. The main one is that for an infinite surface, the copy is by reference, so the .reposition().copy(relabel=True) approach leads to the creation of two surfaces which persist in memory. Eventually I hope to add better support for mappings between related surfaces, and then you also get unnecessary intermediate objects. Still, this should be a minor cost, I think.

Yes, I also believe that a factor two in terms of RAM and a single additional redirection is a cost we could handle. (And the RAM cost could likely be reduced by using a different caching strategy.)

Another comment is that if relabel=False then copy uses the Surface_dict implementation. A bad thing about this is that when new labels are created, ExtraLabel is used [...] Maybe we should also fix ExtraLabel?

The labels ExtraLabel produces are only unique in the current session so there will be collisions when unpickling a surface from a previous session.

Should we just randomize labels instead?

wphooper commented 2 years ago

Randomizing labels would certainly be better, but collisions would still be a rare event. I was hoping for a strategy that would always work. The other issue is that sometime labels get printed like in figures. I'd like to avoid things like E(14512334575) if possible.

One possibility would be just to store a timestamp for when the extralabel was generated in addition to the integer.

Alternately, we could store a time in a module-level variable when generating the first ExtraLabel. When pickling an ExtraLabel, we'd store the timestamp in addition to the integer, and then when restoring if the timestamps don't match we'd map the label to a new integer (caching the identification).

saraedum commented 2 years ago

Randomizing labels would certainly be better, but collisions would still be a rare event. I was hoping for a strategy that would always work.

Well, using a sufficiently large domain, or even something semi-random like a GUID, collisions become (in practice) impossible.

The other issue is that sometime labels get printed like in figures. I'd like to avoid things like E(14512334575) if possible.

If you want to print a serialization of the label, then it will necessarily get somewhat ugly. I don't think you can get much better than a GUID here. One could of course print a much shorter representation and only print the full representation when the long representation is not unique in the current process anymore.

One possibility would be just to store a timestamp for when the extralabel was generated in addition to the integer.

Then labels are not unique if they get created in different processes at the same time. I run sage-flatsurf a lot in parallel, so I don't think that would work for me.

Alternately, we could store a time in a module-level variable when generating the first ExtraLabel. When pickling an ExtraLabel, we'd store the timestamp in addition to the integer, and then when restoring if the timestamps don't match we'd map the label to a new integer (caching the identification).

I don't fully understand what you are proposing here but again, I don't think that a timestamp is unique.

flatsurf / sage-flatsurf

Drop relabel parameter from SimilaritySurface methods #131