beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly
MIT License
7 stars 1 forks source link

Unstable hashes in Pluto.jl notebooks #41

Closed tmistele closed 8 months ago

tmistele commented 9 months ago

I've been trying to use StableHashTraits in a Pluto.jl notebook, like this:

screenshot1

But that is not a good idea because the hash is not actually stable. For example, suppose I want to add a comment to the definition of MyStruct. To do this, I add the comment and then re-evaluate the cell. The hash is now different although the definition of MyStruct hasn't changed, like this:

screenshot2

I think the reason is that parentmodule(MyStruct) is not stable (see the screenshots) but enters the hash:

https://github.com/beacon-biosignals/StableHashTraits.jl/blob/14471608ee0c25a69fa6ba32256a3e0845bcc23b/src/StableHashTraits.jl#L437

I'm not sure how to best deal with this, but I think it is a major footgun for Pluto users so something should be done. It might be good enough to just detect this case and warn or throw an error? Like what seems to be done for other types here:

https://github.com/beacon-biosignals/StableHashTraits.jl/blob/14471608ee0c25a69fa6ba32256a3e0845bcc23b/src/StableHashTraits.jl#L430-L435

Then Pluto users are aware of the issue and can try to work around it.

Fwiw, adding the following code to Pluto notebooks seems to work fine for me. But that's just an ugly hack. Do you know of a better way? Do you think something should be changed on the Pluto side?

begin
    stable_pluto_workspace_name(str) = replace(
        str,
        r"Main\.var\"workspace\#([0-9]+)\"" => "Main.var\"my-pluto-workspace\""
    )

    StableHashTraits.qualified_name_(x::T) where {T} = let
        str = StableHashTraits.qname_(T <: DataType ? x : T, nameof)
        stable_pluto_workspace_name(str)
    end
    StableHashTraits.qualified_type_(x::T) where {T} = let
        str = StableHashTraits.qname_(T <: DataType ? x : T, string)
        stable_pluto_workspace_name(str)
    end
end

Edit: Probably related: https://github.com/fonsp/Pluto.jl/issues/1030 . I guess the problem here is more on the Pluto side. Still, I think it would be good to adjust validate_name to disallow types defined in Pluto by default. And maybe there could be a (nicer) way for Pluto users to hook into StableHashTraits to work around the problem?

haberdashPI commented 9 months ago

Hey there, sorry I've been slow to take a look at this. Had some tight deadlines before the holiday's and am just looking at this upon my return from vacation.

I'm not sure how to best deal with this, but I think it is a major footgun for Pluto users so something should be done. It might be good enough to just detect this case and warn or throw an error? Like what seems to be done for other types here:

I agree. It should be possible to check for this case and print a warning for the user, along with some advice for how to avoid it.

I would be wary of your fix:

StableHashTraits.qualified_name_(x::T) where {T} = let
        str = StableHashTraits.qname_(T <: DataType ? x : T, nameof)
        stable_pluto_workspace_name(str)
    end
    StableHashTraits.qualified_type_(x::T) where {T} = let
        str = StableHashTraits.qname_(T <: DataType ? x : T, string)
        stable_pluto_workspace_name(str)
    end

There are some weird things that can happen when redefining a function that is called inside of a @generated function which stable_type_id is.

I think a better work around would be define this right after defining your type:

StableHashTraits.hash_method(::MyType) = @ConstantHash("Pluto.MyType"), StructHash()

That should ensure that the type tag for your type is constant. It might even be possible to automated this: e.g. if I can verify that the module was created from a Pluto cell somehow.

haberdashPI commented 9 months ago

I like your stable_pluto_workspace_name but one concern I have with it is that we don't really know for certain that workspace comes from a pluto workspace. (Another package could automatically generate modules with this same name schema). I'd really want Pluto.jl to implement e.g. is_pluto_workspace_module or some such. Maybe it already does?

tmistele commented 9 months ago

Thanks for having a look. So I guess the easiest solution would be to warn/error by default and tell the user about a workaround along the lines you suggested, i.e.

StableHashTraits.hash_method(::MyType) = @ConstantHash("Pluto.MyType"), StructHash() 

I agree that doing better probably requires cooperation from Pluto.jl in some form.

cc @fonsp who might have some thoughts?

fonsp commented 9 months ago

@pangoraw understands this better! I'm up for implementing a method is_inside_pluto(::Module) in AbstractPlutoDingetjes.jl (which means that you can ask Pluto directly).

Though I would say that the workspace#<number> heuristic won't give false results. Searching public packages gives only Pluto results. One of our packages uses this heuristic currently: https://github.com/JuliaPluto/PlutoHooks.jl/blob/f6bc0a3962a700257641c3449db344cf0ddeae1d/src/notebook.jl#L89-L98

haberdashPI commented 9 months ago

Though I would say that the workspace# heuristic won't give false results. Searching public packages gives only Pluto results. One of our packages uses this heuristic currently: https://github.com/JuliaPluto/PlutoHooks.jl/blob/f6bc0a3962a700257641c3449db344cf0ddeae1d/src/notebook.jl#L89-L98

Cool! I like the long-term solution of adding a function to AbstractPlutoDingetjes.jl, and I can make use of this heuristic in the meantime.