Open DavidDeSimone opened 1 year ago
Thanks for writing this up. I never fully understood how pdumper works. It doesn't sounds like something you could implement with serde, more like it is taking a snapshot of the heap? Taking a snapshot of the heap seems easy enough, but how would load that back into the runtime? You can't just mark the image as mutable because then it would not be reusable. Do you have copy all the objects from the image and update all the pointers?
A step further (and more similar to v8) is that instead of seeing the entire VM with this file, we can seed a thread with this file containing binary state, so that I can have separate threads loaded up very quickly with pre-seeded memory content with minimal overhead.
Is the dump primarily to speed up Emacs startup, or is it to make it easier to start a new thread? currently all threads share functions, but I could see an alternative where functions are thread local and each thread loads an image instead.
pdumper is more of a snapshot into the heap. From pdumper.c:
/* Format of an Emacs dump file. All offsets are relative to
the beginning of the file. An Emacs dump file is coupled
to exactly the Emacs binary that produced it, so details of
alignment and endianness are unimportant.
An Emacs dump file contains the contents of the Lisp heap.
On startup, Emacs can start faster by mapping a dump file into
memory and using the objects contained inside it instead of
performing initialization from scratch.
The dump file can be loaded at arbitrary locations in memory, so it
includes a table of relocations that let Emacs adjust the pointers
embedded in the dump file to account for the location where it was
actually loaded.
Dump files can contain pointers to other objects in the dump file
or to parts of the Emacs binary. */
My initial thoughts would me something a little slower, but more portable: a 2 pass solution that would look something like this:
Serialize:
Deserialize:
Emacs itself has a reference to this kind of pattern in pdumper.c in a TODO:
/*
TODO:
- Two-pass dumping: first assemble object list, then write all.
This way, we can perform arbitrary reordering or maybe use fancy
graph algorithms to get better locality.
- Don't emit relocations that happen to set Emacs memory locations
to values they will already have.
- Nullify frame_and_buffer_state.
- Preferred base address for relocation-free non-PIC startup.
- Compressed dump support.
The "two-pass" solution that I proposed above allows us to have a portable dump without having to couple to the specific VM that we dumped from, and we can use serde to achieve this scheme. We can even dump this kind of scheme to a human readable format for debugging.
Is the dump primarily to speed up Emacs startup, or is it to make it easier to start a new thread? currently all threads share functions, but I could see an alternative where functions are thread local and each thread loads an image instead.
In emacs, the dump is to improve startup times.
The way I used threading was incorrect in my previous post. I was alluding to scheme more like v8's Isolates, which allow for separate instances of the VM to be run in the same process. In that context, we would use the dump to seed a thread
, which would be an isolated instance of the VM. I am working on another post to discuss that approach for threading, but I got a little ahead of myself.
Executive Summary: I propose that it would be worth while to have Rune dump it's serialized state into a binary file that could be reloaded at a later time to cut down on load times. Usage being that I evaluate a large amount of elisp, dump to a file, and load my VM using that dump'd elisp to cut down on load time. Creating this binary file may require a special
mode
when creating the VM (depending on implementation), but loading the file would not require any special mode. Loading the file would be done at VM initialization and would not be expected to be done "mid run"For a long time, part of emacs build process was it's famous "unexec" flow, where you would load a minimal version of emacs, evaluate a large amount of elisp, and if I recall correctly, then dump part of your process heap into a binary that would be loaded into emacs BSS memory area. Eventually emacs replaced unexec with the portable dumper (https://github.com/emacs-mirror/emacs/blob/master/src/pdumper.h) which isn't as fast, but is much more maintainable.
v8 (Google's Javascript engine) also has a somewhat similar functionality for it's
Isolates
- this is how Deno is able to load the typescript interpreter so quickly. They actually load the interpreter in v8 with their hooks during build time, and dump the binary state that is loaded at run time.Advantages are a notable speedup for targeted applications that load a large amount of elisp. The downside is complexity, but I think with Rust's great serialization libraries and support, this could be done with moderate effort.
A step further (and more similar to v8) is that instead of seeing the entire VM with this file, we can seed a thread with this file containing binary state, so that I can have separate threads loaded up very quickly with pre-seeded memory content with minimal overhead.