97jaz / gregor

Date and time library for Racket
45 stars 10 forks source link

serialization: provide deserialize-info from toplevel modules #59

Closed Bogdanp closed 4 months ago

Bogdanp commented 4 months ago

This change makes it so that the following program runs successfully after being converted to an executable using raco exe:

#lang racket/base

(require gregor
         racket/serialize)

(deserialize (serialize (current-time)))
(deserialize (serialize (today)))
(deserialize (serialize (now/moment)))

As it stands, the deserialization process fails to look up the deserialization-info submodules in an exe. On failure, that process then has a fallback that looks at the toplevel module where the struct is declared to get the info. So, this change fixes the issue by providing the info directly from the toplevel modules in addition to the deserialization-info submodules.

See racket/racket#4967 for more details.

rfindler commented 4 months ago

I don't actually know how all this works but if I'm reading the linked issue correctly, is the problem simply that there is no depenency from the gregor top-level module to the submodule that has the provide of the deserialize information and thus raco exe doesn't add the submodule?

If that's the case, would it work to make a module instead of a module+ and put a bunch of stuff into that submodule, but then have the top-level module reprovide it? That way, the top-level module's exports wouldn't have to change.

Bogdanp commented 4 months ago

I think moving the struct into a module submodule and reproviding that would probably resolve the problem for the same reason this change solves it. But, simply adding an explicit dependency (eg. requiring it or adding a define-runtime-module-path-index from somewhere else) on the submodule isn't enough to resolve the issue. The problem is that, in an exe, the submodule is embedded with a path like #%embedded:gregor/private/datetime>:(deserialize-info), but the racket/serialize library tries to look up the submodule by doing something like

(module-path-index-resolve
 (module-path-index-join
  '(submod "." deserialize-info)
  (module-path-index-join
   '(lib "gregor/private/datetime.rkt") #f)))

which, in a regular module resolves to

#<resolved-module-path:(submod "/Users/bogdan/sandbox/gregor/gregor-lib/gregor/private/datetime.rkt" deserialize-info)>

and can be dynamic-required, but in an exe resolves to

#<resolved-module-path:(submod '#%embedded:gregor/private/datetime: deserialize-info)>

which is different from the embedded path above, so the dynamic-require fails. Maybe the embedding process could do a better job of recording submodule paths such that they can be resolved at runtime correctly, but I haven't dug into it. Even if that improvement can be made to the embedding process, that still leaves the problem of needing an explicit dependency on the submodule.

rfindler commented 4 months ago

Okay! Thank you!

LiberalArtist commented 4 months ago

IIUC, the same problem will affect lots of other serializable structures, I think including those using serializable-struct. It seems like it would be better to figure out how to make this work out-of-the-box than to have to make manual changes in many places.

Bogdanp commented 4 months ago

Uses of serializable-struct are not affected because the serializable-struct macro expands to some code that includes a use of (runtime-require (submod "." deserialize-info)), and that seems to get the submodule recorded in the exe's module lookup table in such a way that its path can be resolved at runtime. I'll update this PR to use runtime-require instead of reproviding the bindings from the toplevel, since that seems better overall.

Bogdanp commented 4 months ago

Digging a little deeper into the embedding code, I think this section of code in the embedded exe module resolver is what makes the runtime-require work, because it specifically recognizes the kind of relative dependency from the parent (base) module to the child that the runtime-require form introduces (and that explains why an external dependency on the module, like I was trying here, wouldn't work).

97jaz commented 4 months ago

I know very little about the implementation of serialization in Racket, so maybe I'm off-base here, but this really sounds like a problem with the serialization code. Is that not the case?

Bogdanp commented 4 months ago

I don't think the serialization code can do much better than it currently does here. The issue is that, fundamentally, to deserialize a piece of serialized data, a module has to be looked up at runtime to discover the deserialization info. In an exe, the submodule by which the deserialize info is looked up is not going to be resolvable unless there is a dependency on it as recorded by something like runtime-require. In fact, without a runtime-require, the submodule won't even be included in the generated executable's module resolution table (unless user code does records a dependency on it), because nothing will statically depend on it.

I think this approach is in line with how other dependencies get recorded for runtime and for executable distribution (i.e. lazy-require, define-runtime-path and co.), and with how serializable-struct itself works, so it seems fine to me. Ideally, we would just use serializable-struct instead of worrying about any of this, but I don't think that's an option here since these structs have custom serialization behavior compared to what serializable-struct would generate.

97jaz commented 4 months ago

Ah, okay. I don't really understand why locating deserialization info should require looking up a module at runtime. When I wrote the serialization code, I was following examples (though I'm no longer sure which examples, since the ones I see now in the reference look different and don't use a submodule). But given that it does need to look up a module at runtime, I agree with you. Thanks for the patch!

LiberalArtist commented 4 months ago

Along those lines, runtime-require seems much better than the manual solution, and I think it would be worth mentioning in the serialization docs. I also wondered if something could be done more automatically in racket/serialize by using compiler/cm-accomplice, but it might end up causing more confusion to try to handle special cases.