discussion about display engine and GUI model of emacs

CeleritasCelery / rune

Rust VM for Emacs

GNU General Public License v3.0

429 stars 24 forks source link

discussion about display engine and GUI model of emacs #61

Open dwuggh opened 6 months ago

dwuggh commented 6 months ago

Hi! Recently I've been trying to understand the emacs display engine and write a simple front-end, then I found this project, which looks interesting. I want to share some of my thoughts on this, and learn about your idea of the ui system of rune.

the current state of emacs

Details can be found in dispextern.c and xdisp.c(has a decent documentation in its comments). To summarize, emacs build a glyph_matrix for each frame, and a sub matrix for each window. How these glyphs should be displayed is defined in text-properties, including face, display or invisible. Emacs will construct an iterator it over a buffer or string, finding these text-props, to build the desired glyph_matrix. The major task of a display engine is /redisplay/. Emacs calls the C redisplay code all from lisp, as described in xdisp.c:

At its highest level, redisplay can be divided into 3 distinct steps, all of which are visible in `redisplay_internal':

. decide which frames need their windows to be considered for redisplay
. for each window whose display might need to be updated, compute
  a structure, called "glyph matrix", which describes how it
  should look on display
. actually update the display of windows on the glass where the
  newly obtained glyph matrix differs from the one produced by the
  previous redisplay cycle

We can use sth like the fontified text prop to controll redisplay, as in font-lock-mode and jit-lock-mode, or just call redisplay or sit-for.

my thoughts

The first part, building the glyph matrix, is effcient enough. The problem is that we cannot separate the UI part from emacs, as the redisplay is part of the lisp VM itself. Moverover, user input will be blocked during redisplay, which leads to signficant lag on C-u or C-d sometimes. As you said in TODOs, a MVP editor model is desired, thus the display engine must be redesigned, both C side and elisp side. This is also mentioned in mailing list.

My question is,

how should the display API organized from lisp? We already have a huge codebase of setting faces and display props in elisp, which is powerful and flexible. For instance, the tex-mode make use of ascent and descent property to display suscripts, and prettify-symbols-mode is awesome either. How to perform these operations in a non-blocking way? My idea is that make put-text-property and other functions directly interact with the ui layer, as a lisp subprocess. But I dont know if this is possible, or the proper way of doing so.
Should emacs adopt a server-client mode, i.e. a separate process like NeoVim does? Neovim use RPC to implement the client-server communication, but I don't think this is the proper way for emacs. Here the same question appears once more: what should the ui part do? Just give the ui thread a buffer and text-prop maps, then let it do the rest? Or should we compute the glyph matrix in the backend, and send it to the ui?
The text buffer structure. I've learnt from your articles that gap buffer is fast, but indexing line numbers would be a problem. If I understant correctly, that means commands like consult-lines or swiper is easier to implement in rope or piece table? Or can I say that the latter 2 data structures is more feature-rich than rope buffers? How about incremental redisplay, emacs implement inc redisplay in a pretty complex appoarch, Idk if this will be better if we have more buffer apis.

I'm just start learning things about text editor's design and implementation these days, so many of my thoughts could go naive or wrong. Would be glad to hear your response!

CeleritasCelery commented 6 months ago

Thanks for your input! I will admit that I don't know that much about GUI's or UI in general. At one point I tried to integrate druid into the project, but removed it because I had other priorities. Ultimately I would like to use as much of an existing GUI framework as I can, but I don't have enough knowledge to judge which one is best.

The first part, building the glyph matrix, is effcient enough. The problem is that we cannot separate the UI part from emacs, as the redisplay is part of the lisp VM itself. Moverover, user input will be blocked during redisplay, which leads to signficant lag on C-u or C-d sometimes.

If computing the glyph matrix is not the expensive part, than what is? Is it the fact that redisplay has to call back into elisp (IIRC)? Do you know what are the most common reasons it would need to do that? Does most redisplay call elisp, or is it only in specialized cases?

how should the display API organized from lisp? We already have a huge codebase of setting faces and display props in elisp, which is powerful and flexible. For instance, the tex-mode make use of ascent and descent property to display suscripts, and prettify-symbols-mode is awesome either. How to perform these operations in a non-blocking way?

So Rune already support multi-threaded elisp. I wonder if we could move the redisplay elisp routines to another thread. However this really depends on what they are actually doing. In the current threading model, a buffer can only be open in one thread at a time. So we would have to put display information into some structure and said it to the main thread with the open buffer. I would like to to as much display work in a different thread as possible.

Should emacs adopt a server-client mode, i.e. a separate process like NeoVim does?

Emacs already has a server/client mode, but it doesn't use a standard protocol so only Emacs frames can connect to it (unlike Neovim). We could implement this over RPC, but I don't think this as a compelling as it for Neovim. It would kind of force us to harden the details of the display engine into the protocol. Emacs has richer display capabilities than Neovim (such as images and webkit widgets) that are harder to do over RPC efficiently. Neovim is basically emulating a terminal.

The text buffer structure. I've learnt from your articles that gap buffer is fast, but indexing line numbers would be a problem. If I understant correctly, that means commands like consult-lines or swiper is easier to implement in rope or piece table?

The metrics for the gap buffer are stored in a binary tree (essentially a rope without the chunks of text) so things like line endings, character counts, or whatever lookup will be O(logn), just like a rope so there is no difference in lookup speed. But we need to make sure that other data structures like text-properties and overlays are made efficient as well.

dwuggh commented 6 months ago

If computing the glyph matrix is not the expensive part, than what is?

The cost is unavoidable when we are launching the display engine, we have to go through every glyph and render it properly. The problem is to reduce its overhead, i.e. no need for updating the whole matrix on every operation. Emacs use a lot of tricks to do this, like creating a glyph cache(or we can call it an glyph atlas), only generate lines that need for updated, only redraw the portions that changed... etc. Besides, sometimes calling jit-lock-fontify or other elisp functions that set the faces and display props is more costy.

Is it the fact that redisplay has to call back into elisp (IIRC)? Do you know what are the most common reasons it would need to do that?

yes, call some hooks like window-configuation-change-hook. These seems to be blocked during async redisplay like mouse movements.

Does most redisplay call elisp, or is it only in specialized cases?

Emacs normally tries to redisplay the screen whenever it waits for input(in the command loop). I think redisplay is called heavily from lisp, check the reference of redisplay and sit-for. Also, the jit-lock-mode works by setting the fontified prop to nil for regions marked need-update. redisplay C code will specifically look for that props.

Emacs already has a server/client mode, but it doesn't use a standard protocol so only Emacs frames can connect to it (unlike Neovim).

It simply connects to a emacs instance with parameters of make-frame.

It would kind of force us to harden the details of the display engine into the protocol. Emacs has richer display capabilities than Neovim (such as images and webkit widgets) that are harder to do over RPC efficiently. Neovim is basically emulating a terminal.

I agree with that, we don't need a subprocess for emacs GUI, at least for now.

In the current threading model, a buffer can only be open in one thread at a time. So we would have to put display information into some structure and said it to the main thread with the open buffer. I would like to to as much display work in a different thread as possible.

for display, only a read handle for the buffer is needed. Is that doable?

At one point I tried to integrate druid into the project, but removed it because I had other priorities. Ultimately I would like to use as much of an existing GUI framework as I can, but I don't have enough knowledge to judge which one is best.

druid is said to be "passively maintained", as the authors move on to their new project, vello and xilem. Currently my toy model use vello as rendering backend. Emacs don't need an extensive GUI framework, we have to render the buffer according to text props after all. There are also not so many choices in rust either way, so I think a winit + a 2D rendering library is good enough. skia(rust binding) is the golden standard, but I've heard that there could be a dependency hell. Other active library I found include [vello] (https://github.com/linebender/vello) and contrast_renderer(I found this through AI days ago... wouldn't even able to google it out). text rendering in rust is a big issue. Lots of relative crates seems not active anymore. Currently I use swash, its already good enough. the author is one of the member in google fonts, as well as a vello developer.

CeleritasCelery commented 6 months ago

You make a good point that we won't need most features from a GUI framework. We won't be creating custom widgets or animations. What we do need is fast and flexible text rendering. That is why I initially went for Druid, because Raph Levien is part of Google fonts and is very focused on text.

So let me see if I understand the basic redisplay loop:

When ever the elisp machine is idle or redisplay is called explicitly then Emacs attempts to redraw the display. It has to walk all the text properties for each buffer and look for ones with fontify = nil. Then it will have to call the functions in fontification-functions fontify that text. Once that is done it can collect all properties and overlays and create a glyph matrix. Depending on what happened we might also need to call window-configuation-change-hook, window-scroll-functions, menu-bar-update-hook, pre-redisplay-functions, and others I am sure I am missing.

for display, only a read handle for the buffer is needed. Is that doable?

Are you saying that we would want multiple threads with read access to the same buffer? That would probably be doable, but curious what parts you think we could run in parallel. In my mind I picture one thread handling events from the user and text shaping (the main thread) and than a background thread getting updates and rerunning redisplay elisp.

for a motivating scenario, I think about scrolling. When we scroll we need to call window-scroll-functions and fontification-functions for any text that is not fontified. Ideally we have a main thread that is taking input events from the user (scroll wheel) and than telling a redisplay thread about which portions of the buffer are now visible and calculating the text layout for them. The redisplay thread will call the associated elisp to fontify things. But the main thread can keep scrolling and display text without the font lock applied, so it is not blocked. Once the font lock text properties are generated they can be sent to the main thread and applied to the glyph matrix. This would hopefully allow smooth >60fps scrolling, but it might look a little weird if text is getting fontified asynchronously.

dwuggh commented 6 months ago

When ever the elisp machine is idle or redisplay is called explicitly then Emacs attempts to redraw the display. It has to walk all the text properties for each buffer and look for ones with fontify = nil. Then it will have to call the functions in fontification-functions fontify that text. Once that is done it can collect all properties and overlays and create a glyph matrix. Depending on what happened we might also need to call window-configuation-change-hook, window-scroll-functions, menu-bar-update-hook, pre-redisplay-functions, and others I am sure I am missing.

basically this is the idea.

but curious what parts you think we could run in parallel.

Ideally we have a main thread that is taking input events from the user (scroll wheel) and than telling a redisplay thread about which portions of the buffer are now visible and calculating the text layout for them.

Maybe we can put the entire text-displaying(like font-lock.el) into the redisplay thread, i.e. making variables and objects thread-local to the redisplay thread, handle scrolling and update highlight directly in frontend, and "send" editing events to the main thread. Other events is then dispatched to the main server, like split-buffer, company-mode, they can then callback to the display thread with things to display.

CeleritasCelery commented 6 months ago

I like this approach. Currently variables and objects are already thread-local. One tricky part will be merging changes from different threads into the main display thread.

rdaum commented 5 months ago

Pulling the discussion back to the original thread topic -- re: display/GUI layer, have you considered just starting with a curses/termio interface in order to get an MVP to test and iterate with? I suppose there's a chance it could back you into a corner from an architecture POV, but it could also be useful just to get the basics of displaying and editing buffers and processing keys in place and get some broader engagement.

CeleritasCelery commented 5 months ago

have you considered just starting with a curses/termio interface in order to get an MVP to test and iterate with?

I am starting to think that is not a bad idea. I have had several false starts on setting up the GUI, in part because I have not worked on GUI's before and there is a lot to learn. But I feel like the TUI model is easier to grok, and it may be easier to get started with and get the basics in place. Do you have any recommendations and what crates we should use? The only one I have heard much about is ratatui. Our requirements at this point are fairly basic (display buffers, scrolling, cursor, etc)

rdaum commented 5 months ago

I don't have a lot of experience there but I think Ratatui is probably too far up the stack for something like Emacs, as it provides widgets, line editors etc. stuff that Emacs would manage itself I'd expect.

Ratatui uses crossterm (https://github.com/crossterm-rs/crossterm) under the hood. As does the Helix editor it seems ( https://github.com/helix-editor/helix/blob/master/helix-tui/Cargo.toml#L24C1-L24C10 )

The other actively developed project seems to be termion which comes out of the Redox project: https://gitlab.redox-os.org/redox-os/termion

All that said, I'm not a UI person either.

On Thu, 21 Mar 2024 at 13:14, Troy Hinckley @.***> wrote:

have you considered just starting with a curses/termio interface in order to get an MVP to test and iterate with?

I am starting to think that is not a bad idea. I have had several false starts on setting up the GUI, in part because I have not worked on GUI's before and there is a lot to learn. But I feel like the TUI model is easier to grok, and it may be easier to get started with and get the basics in place. Do you have any recommendations and what crates we should use? The only one I have heard much about is ratatui https://crates.io/crates/ratatui.

— Reply to this email directly, view it on GitHub https://github.com/CeleritasCelery/rune/issues/61#issuecomment-2013085052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMEHRZVUER7VYSCKCXYQDYZMIPLAVCNFSM6AAAAABEALX3DWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJTGA4DKMBVGI . You are receiving this because you commented.Message ID: @.***>

rdaum commented 5 months ago

One more thought on this general thread. I need to spend a chunk of time focusing on what you two have written here to absorb the threading model proposals, but I'll just put something out there as a thought experiment/half-baked proposal.

Which is that it might be worth looking at the process (not really threading) model used by Chrome/Chromium.

In that system each tab is a separate process (actually can be two). This provides parallelism like threads but also adds isolation (for security and stability reasons) and some observability even (from ps/top/task-manager, etc.). The coordination of the tabs with the surrounding shell, and other processes, is done using lightweight, fast IPC. (In chromium this is called "mojo"). Within each process it's possible to have threading, etc but the process model provides a less granular framework within which isolation of each tab can be provided.

In Chromium in fact this ends up being divided such that there are 'renderer', 'browser', and 'utility' processes. Each tab gets its own 'browser' process at a minimum. (Utility processes generally do network I/O, but I forget exactly how this is modeled as it's been some years since I worked in that codebase.) And then everything is tied together by a 'shell'.

So zooming out and thinking of this in the context of emacs, I can imagine an architecture where:

Each buffer, or set of buffers maybe, gets its own Unix process. Within that Unix process would be a complete elisp VM with its own garbage collection space, etc.
There's a 'renderer' or 'buffer host' process which is responsible for dispatching keyboard / mouse events to buffer processes and actually rendering buffers and handling display events from buffer processes.
Communication between the two happens over IPC, and for this I'd recommend https://github.com/eclipse-iceoryx/iceoryx2 which is getting active development, excellent performance, and provides the right zero-copy/shared-memory abstractions to do this well.
Actually painting the buffer could be done by the renderer process, but as much as possible the instructions to do so should be provided by the buffer process, and hopefully the bulk of the code running inside the buffer process should be written in elisp, not Rust.

The biggest difficulty with this could be that the GNU emacs model as it exists assumes a unified monolithic namespace with no separation between running programs. There are global variables and functions, a global interpreter lock, and the assumption is that any program can mutate any global state. And that might not be really possible with the model I'm proposing here.

But I guess the question I have more broadly then is: is the intent to be compatible with existing GNU emacs packages? Or just more broadly with elisp and the emacs philosophy / conventions?

Honestly, I'd personally propose that you strongly consider the latter, because I think it will be a losing game to try to aim for 1:1 compatibility, and there's a lot of potential "win" by just keeping the (rather unique) emacs buffer & key / macro bindings concepts, elisp interpreter, and default emacs keybindings but not chasing after making existing modes run as-is. Making things easily portable, yes, I think that's a reasonable goal.

CeleritasCelery commented 5 months ago

But I guess the question I have more broadly then is: is the intent to be compatible with existing GNU emacs packages? Or just more broadly with elisp and the emacs philosophy / conventions?

Emacs ships with over a million lines of elisp builtin. It has more lisp than C by a good margin. Not to mention all the existing packages like org-mode, magit, lsp-mode, etc. There are a lot of efforts to create an "Emacs-like" Editor but not tied to elisp (climacs, pimacs, guileEmacs, etc). But all those Editors miss out on the elisp package ecosystem. To me, the body of existing elisp is one of Emacs greatest strengths. The goal is to be compatible with (most of) that elisp and allow seamlessly running it. I think rewriting the core and rewriting (or at least porting) the millions of lines of existing code would be a much larger task. And even if you managed to do that, you would still have to try and keep pace with the ecosystem, or build a new one around the new editor.

But you are correct that elisp was not written with parallelism in mind. And since it is essentially a giant ball of mutable state, adding concurrency to it in a way that is both safe and useful is hard. I have written about this a little here.

I think that any parallel Emacs will need to consider how to share code and state. The process model makes that harder than threads (though iceoryx looks really compelling). For example we would want to have a single thread/process be able to access multiple buffers and not be tied to a single one. A lot of code will switch to different buffers for different parts of their operation and this become a lot harder if buffers are all in their own process. Browser tabs naturally lend themselves to isolation because they are typically different sites and have not need to observe each other, but that is not true of buffers.

That being said, I am open to ideas of how to approach this. My current line of thinking is just a proposal and not "the one true" concurrency model for Emacs. However I think any parallelism approach needs to consider these things:

doesn't break (most) existing code
is safe to use (i.e. users can't introduce data races or segfaults)
is powerful enough to actually do useful work. We would like to see code get significant benefits from using multiple cores, since core counts will only be increasing in the future.

dwuggh commented 5 months ago

I'm writing a demo GUI using webgpu(vello as backend), currently it models a face-based text rendering. once the textprop related elisp functions complete, I can get my hands on realworld emacs rendering.

Each buffer, or set of buffers maybe, gets its own Unix process. Within that Unix process would be a complete elisp VM with its own garbage collection space, etc.

hooks would probably become a problem, like post-command-hook, window-configuration-change-hook, etc. Also, stuffs like undo-redos or jump lists needs to be global.

There's a 'renderer' or 'buffer host' process which is responsible for dispatching keyboard / mouse events to buffer processes and actually rendering buffers and handling display events from buffer processes.

This seems to be a popular appoarch for "modern" text editors, however we still need let elisp to handle most of the events, even moving cursors, I take this as a big advantage of emacs. Consider markup languages, if we want to toggle markups in depends of whether cursor is in the same line, then we need to modify the textprops on every cursor movement.

However in some cases the emacs render logic is not optimal: for example, tree sitter will create its own parser tree, then transporting it to faces in an interval tree is redundant(seems emacs is currently using tree sitters this way), we can directly use tree sitter's parse result and skip most of the elisp part.

rdaum commented 5 months ago

Yeah I definitely think it would be a challenge do the Chrome-like process model if one is intending on GNU Emacs compatibility. You could do an Emacs, in the classical definition of it, with different rules and memory model, but very difficult to do a GNU Emacs compatible system.

In a browser like Chrome each tab is a fully isolated V8 VM. There definitely are shared entities exposed by the browser API, but they "live" in a sort of separate world and are exported "into" the VM. The actual VM and GC are split up across tabs. By design. So yes, things like hooks, etc could get tricky. (Not impossible, just tricky. )

Because, yes, let's say you define a hook like you mentioned... in which VM in which process does it run when triggered? Mutating which state. It doesn't really make a lot of sense unless you have some kind of shared transactional memory that each VM writes to. (Which is the model I use in a system I work on, actually)

dwuggh commented 5 months ago

The goal is to be compatible with (most of) that elisp and allow seamlessly running it.

Maybe consider writing a compatible layer(with low performance) in elisp? The emacs devel team also agreed on that the display engine is outdated and should be rewrited mailing lists , just no one does that.

rdaum commented 5 months ago

Maybe consider writing a compatible layer(with low performance) in elisp? The emacs devel team also agreed on that the display engine is outdated and should be rewrited mailing lists , just no one does that.

Is there a reason why it'd have to be slow? I don't see why. There are plenty of garbage collected interpreted Lisps that execute at excellent speeds. Expose crossterm and skia (or similar) to elisp via ffi and then do the rest in-VM.

dwuggh commented 5 months ago

Maybe consider writing a compatible layer(with low performance) in elisp? The emacs devel team also agreed on that the display engine is outdated and should be rewrited mailing lists , just no one does that.

Is there a reason why it'd have to be slow? I don't see why. There are plenty of garbage collected interpreted Lisps that execute at excellent speeds. Expose crossterm and skia (or similar) to elisp via ffi and then do the rest in-VM.

I mean a compat layer between new and old display APIs, like the transpiling from tree-sitter to textprops tree, or from a multi-thread or multi-process model to a single-thread model. It may or may not be slow, I don't know for sure.

dwuggh commented 5 months ago

Expose crossterm and skia (or similar) to elisp via ffi and then do the rest in-VM.

You mean directly writing the GUI in elisp itself? Never thought about that... kind of post-modern for me

rdaum commented 5 months ago

Sure, I mean, there's other emacs that are written 100% in Lisp. E.g. lem https://github.com/lem-project/lem is 100% Common Lisp, and Zmacs on Lisp Machines (https://en.wikipedia.org/wiki/Zmacs)

appetrosyan commented 4 months ago

If you could get the implementation efficient enough, there are significant advantages, because you would then be able to poke holes into it from the config. The main problem is that these would never be standard, and never accepted if there are two emacsen. This essentially precludes this happening unless you can do it efficiently in both C emacs and this project.