JuliaServices / AutoHashEquals.jl

A Julia macro to add == and hash() to composite types.
Other
57 stars 12 forks source link

Generated inner constructor should have parameters constrained. #49

Open gafter opened 1 year ago

gafter commented 1 year ago

According to the Julia documentation at https://docs.julialang.org/en/v1/manual/constructors/#man-inner-constructor-methods

If any inner constructor method is defined, no default constructor method is provided: it is presumed that you have supplied yourself with all the inner constructors you need. The default constructor is equivalent to writing your own inner constructor method that takes all of the object's fields as parameters (constrained to be of the correct type, if the corresponding field has a type), and passes them to new, returning the resulting object:

But @auto_hash_equals generates the constructor with unconstrained parameter types. Those parameters should be constrained.

gafter commented 1 year ago

This is a slightly breaking change. The following no longer runs:

@auto_hash_equals_cached struct S268
    x::Int
end
@assert S268(2.0).x === 2 # MethodError: no method matching Main.A.S268(::Float64)

@auto_hash_equals_cached struct S269{T <: Any}
    x::T
end
@assert S269{Int}(2.0).x === 2 # MethodError: no method matching Main.A.S269{Int64}(::Float64)
@assert S269(2.0).x === 2.0

So we would have to bump the major version number.

mcmcgrath13 commented 1 year ago

I tried playing with this as it surprised me a bit and I don't see Julia enforcing that behavior with the default constructor. Are we sure we want to do this?

julia> struct Foo{T}
       a::T
       end

julia> Foo(2.0)
Foo{Float64}(2.0)

julia> Foo{Int}(2.0)
Foo{Int64}(2)
julia> struct Bar
       a::Int
       end

julia> Bar(2.0)
Bar(2)
gafter commented 1 year ago

@mcmcgrath13 The repl has different behavior than the language elsewhere. I think this is documented somewhere. To see the "proper" Julia behavior, you need to put the code in a source file and include it.

(which I "corrected" to)

Actually, the Julia documentation cited in the OP is not correct. Julia does NOT put type constraints on the parameters of the implicit constructor:

struct S1
    x::Int
end
S1(2.0) # OK

struct S2
    x::Int
    S2(x::Int) = new(x)
end
S2(2.0) # error!

This is the same in or out of the REPL.

After further research, I wrote https://github.com/JuliaLang/julia/pull/51253

gafter commented 1 year ago

I have offered to fix Julia's documentation at https://github.com/JuliaLang/julia/pull/51253.

Whatever the correct specification of the correct behavior of Julia is, this package should

  1. Imitate that behavior when no inner constructor is provided, and
  2. Imitate the default behavior of the language when an inner constructor is provided.

I suspect that fixing this issue may permit this package to interoperate with other macros better.

timholy commented 1 year ago

I don't see anything particularly wrong with the constructor:

julia> macroexpand(Main, quote
       @auto_hash_equals cache=true struct S269{T<:Integer}
           x::T
       end
       end)
quote
    #= REPL[9]:2 =#
    begin
        #= REPL[9]:2 =#
        struct S269{T <: Integer}
            #= REPL[9]:3 =#
            x::T
            _cached_hash::UInt
            function S269{T}(x) where T <: Integer
                #= /home/tim/.julia/packages/AutoHashEquals/8dq3O/src/impl.jl:256 =#
                #= /home/tim/.julia/packages/AutoHashEquals/8dq3O/src/impl.jl:257 =#
                new(x, (Base).hash(x, 0xa63963209f14699e))
            end
        end
...

Maybe it could be improved like this:

function S269{T}(x) where T <: Integer
    y = convert(T, x)
    new(y, (Base).hash(y, 0xa63963209f14699e))
end

That would eliminate a corner case: T(x1) === T(x2) but typeof(x1) !== typeof(x2) and the two hash differently (which would indeed be bad).

timholy commented 1 year ago

In fact, it doesn't even need to be two different types:

julia> @auto_hash_equals cache=true struct S{T}
           x::T
       end
S

julia> s1, s2 = S{Float32}(1.0), S{Float32}(1.0+eps())
(S{Float32}(1.0f0), S{Float32}(1.0f0))

julia> s1 == s2
false

julia> s1.x == s2.x
true

This happens because 1.0 and 1.0+eps() both round to 1.0f0, but the Float64-based hash value is different. If you performed the conversion first, and then hashed, things would work correctly. Likewise, omitting the cache=true causes things to work properly.

This is why a good mental model for default constructors is the following (where all constructors are explicit):

struct S{T}
    x::T

    S{T}(x::T) where T = new{T}(x)        # constructor 1
end
S{T}(x) where T = S{T}(convert(T, x)::T)  # constructor 2
S(x) = S{typeof(x)}(x)                    # constructor 3

It's not exactly what we do (we effectively inline constructor 1 into constructor 2) but it's the right way to think about it.

Moreover, this design would fix the bug here: compute the cached hash only in the inner constructor S{T}(x::T), and everything else is an outer constructor which calls this one.

Presumably this is essentially what you mean by "should have parameters constrained," but you also need the convert constructors to keep everything working. Or you can keep the current design but convert before hashing.

gafter commented 1 year ago

@timholy What is wrong with the current behavior is that it defines one constructor method rather than two. The default constructor has two methods. convert is already performed by the new invocation and doesn't need to appear in the expansion. Adding a convert invocation makes no difference whatsoever to the generated code.

If there is no source equivalent to the set of constructor methods automatically provided, the documentation should say that.

As far as I can tell, the default constructors are equivalent to this for a parametric type:

struct S{T}
    x::T

    S{T}(x::T) where T = new{T}(x)
end
S{T}(x) where T = S{T}(convert(T, x)::T)
S(x::T) where T = S{T}(x)

The hand-inlined type inference in your definition of S(x) would only be correct for the simplest cases of precisely one field being precisely of the type of a type parameter. The last method, above, seems to handle all cases correctly.

However, this only applies to parametric types. Non-parametric types have a much simpler expansion, documented in https://github.com/JuliaLang/julia/pull/51253 . That PR doesn't provide the expansion for parametric types, which is already (incorrectly) documented in a later section of the document modified by that PR.

I'll modify that PR to better handle the case of parametric types, including the required conversion on the second constructor method, above.

timholy commented 1 year ago

The bug illustrated in the first code-block of https://github.com/JuliaServices/AutoHashEquals.jl/issues/49#issuecomment-1719326946 is fixed by the suggestion in the second code-block of https://github.com/JuliaServices/AutoHashEquals.jl/issues/49#issuecomment-1718312192,

function S269{T}(x) where T <: Integer
    y = convert(T, x)
    new(y, (Base).hash(y, 0xa63963209f14699e))
end

Try it; you'll see that adding a convert indeed makes a difference. What's crucial is that the conversion happens before the hash computation, not after; it doesn't much matter whether you achieve this by arranging the order in which constructors get called (if you have multiple constructors), or just get the logic right in a single constructor. That is to say, either you could have one constructor (the one in the code block above this paragraph) or several (the one in the final code-block of https://github.com/JuliaServices/AutoHashEquals.jl/issues/49#issuecomment-1719326946).

Focusing on counting the number of methods doesn't seem so important:

f(x) = 1
f(x::Int) = 1
f(x::Float64) = 1

This is three methods, but one would be forgiven for suggesting that one could delete the last two (and it's probably better if one does).

timholy commented 1 year ago

For example, let's try the implementation I think you're advocating:

julia> struct CachedHash{T}
           x::T
           cachedhash::UInt

           CachedHash{T}(x::T) where T = new{T}(x, hash(x, 0xa63963209f14699e))
           CachedHash{T}(x) where T = new{T}(x, hash(x, 0xa63963209f14699e))
       end

julia> s1, s2 = CachedHash{Float32}(1.0), CachedHash{Float32}(1.0 + eps());

julia> dump(s1)
CachedHash{Float32}
  x: Float32 1.0f0
  cachedhash: UInt64 0x691e5307da57bbf4

julia> dump(s2)
CachedHash{Float32}
  x: Float32 1.0f0
  cachedhash: UInt64 0xc58b93505afe7c50

Oops. You have s1.x == s2.x but the cached hashes are different. Just having two methods and calling new from both of them doesn't magically fix a logic bug.

gafter commented 1 year ago

@timholy The current behavior of AutoHashEquals is incorrect because it doesn't provide a set of constructors that are semantically equivalent to the default ones. This issue asks that to be corrected. I do not expect to make the error you suggest above in computing the hash code in the implementation of AutoHashEquals.jl, though I appreciate having it called to my attention so I can be sure to add tests.