"unexec/pdump" - VM memory serialization and loading

DavidDeSimone commented 1 year ago

Executive Summary: I propose that it would be worth while to have Rune dump it's serialized state into a binary file that could be reloaded at a later time to cut down on load times. Usage being that I evaluate a large amount of elisp, dump to a file, and load my VM using that dump'd elisp to cut down on load time. Creating this binary file may require a special mode when creating the VM (depending on implementation), but loading the file would not require any special mode. Loading the file would be done at VM initialization and would not be expected to be done "mid run"

For a long time, part of emacs build process was it's famous "unexec" flow, where you would load a minimal version of emacs, evaluate a large amount of elisp, and if I recall correctly, then dump part of your process heap into a binary that would be loaded into emacs BSS memory area. Eventually emacs replaced unexec with the portable dumper (https://github.com/emacs-mirror/emacs/blob/master/src/pdumper.h) which isn't as fast, but is much more maintainable.

v8 (Google's Javascript engine) also has a somewhat similar functionality for it's Isolates - this is how Deno is able to load the typescript interpreter so quickly. They actually load the interpreter in v8 with their hooks during build time, and dump the binary state that is loaded at run time.

Advantages are a notable speedup for targeted applications that load a large amount of elisp. The downside is complexity, but I think with Rust's great serialization libraries and support, this could be done with moderate effort.

A step further (and more similar to v8) is that instead of seeing the entire VM with this file, we can seed a thread with this file containing binary state, so that I can have separate threads loaded up very quickly with pre-seeded memory content with minimal overhead.

CeleritasCelery commented 1 year ago

Thanks for writing this up. I never fully understood how pdumper works. It doesn't sounds like something you could implement with serde, more like it is taking a snapshot of the heap? Taking a snapshot of the heap seems easy enough, but how would load that back into the runtime? You can't just mark the image as mutable because then it would not be reusable. Do you have copy all the objects from the image and update all the pointers?

A step further (and more similar to v8) is that instead of seeing the entire VM with this file, we can seed a thread with this file containing binary state, so that I can have separate threads loaded up very quickly with pre-seeded memory content with minimal overhead.

Is the dump primarily to speed up Emacs startup, or is it to make it easier to start a new thread? currently all threads share functions, but I could see an alternative where functions are thread local and each thread loads an image instead.

DavidDeSimone commented 1 year ago

pdumper is more of a snapshot into the heap. From pdumper.c:

/* Format of an Emacs dump file.  All offsets are relative to
   the beginning of the file.  An Emacs dump file is coupled
   to exactly the Emacs binary that produced it, so details of
   alignment and endianness are unimportant.
   An Emacs dump file contains the contents of the Lisp heap.
   On startup, Emacs can start faster by mapping a dump file into
   memory and using the objects contained inside it instead of
   performing initialization from scratch.
   The dump file can be loaded at arbitrary locations in memory, so it
   includes a table of relocations that let Emacs adjust the pointers
   embedded in the dump file to account for the location where it was
   actually loaded.
   Dump files can contain pointers to other objects in the dump file
   or to parts of the Emacs binary.  */

My initial thoughts would me something a little slower, but more portable: a 2 pass solution that would look something like this:

Serialize:

All gc objects get resolved to a universal reference (possibly by guid)
We serialize the object graph replacing pointers by their assigned guids

Deserialize:

All gc objects are unserialized, resolving guids in a recursive manner.

Emacs itself has a reference to this kind of pattern in pdumper.c in a TODO:

/*
  TODO:
  - Two-pass dumping: first assemble object list, then write all.
    This way, we can perform arbitrary reordering or maybe use fancy
    graph algorithms to get better locality.
  - Don't emit relocations that happen to set Emacs memory locations
    to values they will already have.
  - Nullify frame_and_buffer_state.
  - Preferred base address for relocation-free non-PIC startup.
  - Compressed dump support.

The "two-pass" solution that I proposed above allows us to have a portable dump without having to couple to the specific VM that we dumped from, and we can use serde to achieve this scheme. We can even dump this kind of scheme to a human readable format for debugging.

DavidDeSimone commented 1 year ago

Is the dump primarily to speed up Emacs startup, or is it to make it easier to start a new thread? currently all threads share functions, but I could see an alternative where functions are thread local and each thread loads an image instead.

In emacs, the dump is to improve startup times.

The way I used threading was incorrect in my previous post. I was alluding to scheme more like v8's Isolates, which allow for separate instances of the VM to be run in the same process. In that context, we would use the dump to seed a thread, which would be an isolated instance of the VM. I am working on another post to discuss that approach for threading, but I got a little ahead of myself.

CeleritasCelery / rune

"unexec/pdump" - VM memory serialization and loading #23