dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.59k stars 112 forks source link

use kitty's direct file reference when applicable #2124

Open dankamongmen opened 3 years ago

dankamongmen commented 3 years ago

When sensible, we ought go ahead and use the Kitty protocol's direct file load. This requires

(a) that we have WX access to a directory the running kitty process can RX, (b) that the ncvisual be provided to us as a file (i don't see writing things out as being a win), (c) that no changes have been made to the ncvisual (technically, we could handle a permanent scaling by passing scaling parameters), (d) that the source be a format the running terminal can understand, and (e) that we receive errors, associate them with rendering requests, and do some kind of retry

technically any load can lead to an error, but it'll become much more important in this case (due to the increased potential for errors). the win would be a speedup on initial load.

since we are required to keep the image in kitty prior to reflective animations, this will only apply to kitty >= 0.20.0 or whatever version it is that gets KITTY_SELFREF. even in that case, we still need to read the image, to know which cells are transparent, mixed, and opaque.

we'd probably want a directory specified in the environment or notcurses_options or something.

this is low priority. it'll introduce a lot of alternative code paths, and doesn't allow us to eliminate any existing code. we'll need a whole feedback system for taking errors and retrying. however, without doing this, dumb little 2-line programs can beat us on meaningless benchmarks.

i don't see any advantage to e.g. spooling out files from a multiframe media (video); we ought be able to write through a pty just as quickly as we can write into page cache. the only advantage is when we can avoid the write entirely, i.e. when it already exists on disk.

kovidgoyal commented 3 years ago

You can avoid writing to disk by using shared memory instead, which will avoid an encode step on your side and a decode step on kitty's side.

dankamongmen commented 3 years ago

You can avoid writing to disk by using shared memory instead, which will avoid an encode step on your side and a decode step on kitty's side.

it's not the disk i/o that worries me -- as noted, we're almost certainly working exclusively with page cache. it's just that i can't write that much more quickly than i can write over the pty (though i do avoid base64 and chunking overhead).

dankamongmen commented 3 years ago

by the way @kovidgoyal , does the path need to support mmap() semantics, or can i give you a named pipe?

kovidgoyal commented 3 years ago

On Sun, Aug 29, 2021 at 11:05:57PM -0700, nick black wrote:

You can avoid writing to disk by using shared memory instead, which will avoid an encode step on your side and a decode step on kitty's side.

it's not the disk i/o that worries me -- as noted, we're almost certainly working exclusively with page cache. it's just that i can't write that much more quickly than i can write over the pty (though i do avoid base64 and chunking overhead).

I dont follow, writing to a file involves, at least, the following O(n) operations, to get the image to the screen:

1) Encoding to base64 2) Copying into kernel memory 3) Copying from kernel memory to user space memory 4) Decoding from base64

And you cant actually rely on the page cache being always used. It depends on what kind of pressure the system is under.

Compared to using SHM, which involves:

1) Copying to shared memory 2) Copying from shared memory

And then if your data is starting out in RAM and not in a disk file, it is actually simpler code wise, to use SHM than to use a disk file, because you dont have to worry about identifying an appropriate directory to write the file to.

kovidgoyal commented 3 years ago

On Sun, Aug 29, 2021 at 11:06:33PM -0700, nick black wrote:

by the way @kovidgoyal , does the path need to support mmap() semantics, or can i give you a named pipe?

The spec envisages using regular files, so best to stick to those. named pipe semantics are fairly different across platforms.

dankamongmen commented 3 years ago

The spec envisages using regular files, so best to stick to those. named pipe semantics are fairly different across platforms.

.nodnod. i would recommend that this be locked down explicitly, as a named pipe is quite attractive -- it would essentially give me the 8-bit channel for direct transmission i've long wanted, without having to worry about disk space or page cache contention.

dankamongmen commented 3 years ago

The spec envisages using regular files, so best to stick to those. named pipe semantics are fairly different across platforms.

.nodnod. i would recommend that this be locked down explicitly, as a named pipe is quite attractive -- it would essentially give me the 8-bit channel for direct transmission i've long wanted, without having to worry about disk space or page cache contention.

which i suppose shared memory would also provide

dankamongmen commented 3 years ago

~hrmmm, i see a potential problem here. let's say i'm processing a video, and writing out each frame as a file or shared memory (they both suffer from this problem). i'm writing these files out, 100K per file. kitty has a 200M image area. there are say 100K frames, and we have 400M available on disk (or for shared memory).~

~how do i know when i can delete/free a file? can i do so after getting the OK message? if so, why is there a kitty-side image area limit (suggesting that kitty needs to keep them around following the OK)?~

the t option solves this problem, removing the file after it's read:

oh, the s also gets it cleaned up by the terminal. ok, good good.

kovidgoyal commented 3 years ago

Yes, adding some language to the spec to exclude named pipes is a good idea, since SHM works just as well and is already there.

kovidgoyal commented 3 years ago

Note that the cleanup of t and s are not quite bulletproof, since if the application crashes after creating them but before transmitting the escape code notifying the terminal of their existence, they will leak. But, we cant really avoid that without some kind of multi-stage commit protocol.

dankamongmen commented 3 years ago

Note that the cleanup of t and s are not quite bulletproof, since if the application crashes after creating them but before transmitting the escape code notifying the terminal of their existence, they will leak. But, we cant really avoid that without some kind of multi-stage commit protocol.

yep was literally typing this out. named pipe has the same problem. only solution i can think of here would be creating something with O_TMPFILE and sending a file descriptor over AF_UNIX, which who knows if there's a Windows equivalent to that

kovidgoyal commented 3 years ago

On Sun, Aug 29, 2021 at 11:31:10PM -0700, nick black wrote:

~hrmmm, i see a potential problem here. let's say i'm processing a video, and writing out each frame as a file or shared memory (they both suffer from this problem). i'm writing these files out, 100K per file. kitty has a 200M image area. there are say 100K frames, and we have 400M available on disk (or for shared memory).~

This is not a video display protocol :) The answer is you throttle your video writing. Only write N frames in advance of the frame you are currently asking the terminal to display.

~how do i know when i can delete/free a file? can i do so after getting the OK message? if so, why is there a kitty-side image area limit (suggesting that kitty needs to keep them around following the OK)?~

kitty doesnt touch the file other than to read from it in f mode. In the other modes it reads from the files and deletes them after being read and stores the image data internally, as long as it is still referenced.

dankamongmen commented 3 years ago

Yes, adding some language to the spec to exclude named pipes is a good idea, since SHM works just as well and is already there.

IMHO the protocol ought also make explicit that symlinks must be followed. i don't intend to use this, but it ought be explicit one way or the other, and i'd prefer the freedom to do so (if for no other reason that otherwise i can't support a symlink if provided one without copying it to a regular file).

dankamongmen commented 3 years ago

IMHO the protocol ought also make explicit that symlinks must be followed. i don't intend to use this, but it ought be explicit one way or the other, and i'd prefer the freedom to do so (if for no other reason that otherwise i can't support a symlink if provided one without copying it to a regular file).

or just following it myself i guess. hrmmm, i guess go one way or the other, but make it explicit either way. not allowing them eliminates the possibility of link cycles.

kovidgoyal commented 3 years ago

IMHO the protocol ought also make explicit that symlinks must be followed. i don't intend to use this, but it ought be explicit one way or the other, and i'd prefer the freedom to do so (if for no other reason that otherwise i can't support a symlink if provided one without copying it to a regular file).

or just following it myself i guess. hrmmm, i guess go one way or the other, but make it explicit either way. not allowing them eliminates the possibility of link cycles.

Am fairly sure open() is robust against symlink cycles, in that it will simply fail. And so kitty will fail to read the data and respond with an error.

dankamongmen commented 3 years ago

Am fairly sure open() is robust against symlink cycles, in that it will simply fail. And so kitty will fail to read the data and respond with an error.

yep

   ELOOP  Too many symbolic links were encountered in resolving pathname.

excellent, then IMHO it would be best to explicitly require symlink resolution in the terminal (done by default in open(2), O_NOFOLLOW turns it off, don't know about windows).

kovidgoyal commented 3 years ago

done

dankamongmen commented 3 years ago

so let's see i can send a test load from shared memory at the beginning. what happens if i send a name that happens to exist in the remote (terminal-side) shared memory namespace as well, and is readable by the terminal process? i probably get back an "invalid data"-style error, though in the worst case someone else is sitting there with an image in that shared memory segment, in which case i get a random image, and believe i'm on the same machine as you.

not much to do against an adversarial remote process except challenge-response with preshared keys or a PKI. that's a ridiculous idea (is it? what if the image is air traffic control?). you could send along a nonce, but even that seems silly. hrmmmm. not a huge problem by any means, but something perhaps worth thinking about....

dankamongmen commented 3 years ago

i know this isn't the kitty issue tracker but

While as of May 2020, kitty is the only terminal emulator to support this graphics protocol, we intend that any terminal emulator that wishes to support it can.

you might want to update this to mention WezTerm's adoption? a second implementation carries a lot of weight imho.

kovidgoyal commented 3 years ago

On Sun, Aug 29, 2021 at 11:57:15PM -0700, nick black wrote:

so let's see i can send a test load from shared memory at the beginning. what happens if i send a name that happens to exist in the remote (terminal-side) shared memory namespace as well, and is readable by the terminal process? i probably get back an "invalid data"-style error, though in the worst case someone else is sitting there with an image in that shared memory segment, in which case i get a random image, and believe i'm on the same machine as you.

not much to do against an adversarial remote process except challenge-response with preshared keys or a PKI. that's a ridiculous idea (is it? what if the image is air traffic control?). you could send along a nonce, but even that seems silly. hrmmmm. not a huge problem by any means, but something perhaps worth thinking about....

Just use a uuid4 as the name. That takes care of collisions.

I dont follow your adversarial threat model. What is this remote process trying to do? Display a different image than the one the originating process sent?

kovidgoyal commented 3 years ago

On Sun, Aug 29, 2021 at 11:59:45PM -0700, nick black wrote:

i know this isn't the kitty issue tracker but

While as of May 2020, kitty is the only terminal emulator to support this graphics protocol, we intend that any terminal emulator that wishes to support it can.

you might want to update this to mention WezTerm's adoption? a second implementation carries a lot of weight imho.

I intend to once it is released.

dankamongmen commented 3 years ago

Just use a uuid4 as the name. That takes care of collisions. I dont follow your adversarial threat model. What is this remote process trying to do? Display a different image than the one the originating process sent?

yes, and it's worrying me more the more i think about it. from such small things do great CVEs spring.

dankamongmen commented 3 years ago

Just use a uuid4 as the name. That takes care of collisions. I dont follow your adversarial threat model. What is this remote process trying to do? Display a different image than the one the originating process sent?

yes, and it's worrying me more the more i think about it. from such small things do great CVEs spring.

the obvious big worry here would be providing a malicious image that exploits the media layer in the terminal.

dankamongmen commented 3 years ago

yeah. kitty is running on machine M as user A. attacker E has a shell on M. kitty user is ssh'd to machine Z. attacker E broadly consumes the shared memory namespace, or predicts the name to be used, and creates a local shared memory segment containing a malicious image (plenty of history here). E now runs code as A on M.

dankamongmen commented 3 years ago

yeah this seems fundamentally unsafe. you're breaking down the walls of the otherwise secure ssh connection (less secure transports can obviously be MitMd more easily). i think the idea earlier of sending a file descriptor across an AF_UNIX socket is better for this reason, in addition to the reliable cleanup via O_TMPFILE (obviously it has other problems, especially on windows). just thinking out loud here.

dankamongmen commented 3 years ago

btw no matter what i am in no way suggesting running challenge-response =] =] just trying to think of alternatives

dankamongmen commented 3 years ago

an ownership check would be a pretty decent solution, security-wise, but sucks otherwise as you (the client) can no longer operate on all images you (the client again) can read.

dankamongmen commented 3 years ago

ahhh, cryptographic-grade hash transmitted along with the load command would work, though that seems a bit grotesque overkill for this

dankamongmen commented 3 years ago

ahhh, cryptographic-grade hash transmitted along with the load command would work, though that seems a bit grotesque overkill for this

that also requires an operation on each load, rather than once at startup, lame

kovidgoyal commented 3 years ago

On Mon, Aug 30, 2021 at 12:15:29AM -0700, nick black wrote:

yeah this seems fundamentally unsafe. you're breaking down the walls of the otherwise secure ssh connection (less secure transports can obviously be MitMd more easily). i think the idea earlier of sending a file descriptor across an AF_UNIX socket is better for this reason, in addition to the reliable cleanup via O_TMPFILE (obviously it has other problems, especially on windows). just thinking out loud here.

The side band does not work over ssh to different machines. And a program running on the same machine can inject data directly into the pty device via /proc anyway. So what, exactly is Eve going to be able to do that she couldnt do before?

dankamongmen commented 3 years ago

The side band does not work over ssh to different machines. And a program running on the same machine can inject data directly into the pty device via /proc anyway.

you're claiming that another user on my machine can inject data into my pty via proc? i find that hard to believe. surely basic user permissions prevent this from happening.

dankamongmen commented 3 years ago

i mean if other users can freely inject data into my terminal, all is lost, it would seem (root is obviously a different deal, but root can do whatever).

the difference here is that another user can put arbitrary data somewhere they own and the terminal will go seek it out

dankamongmen commented 3 years ago

i.e. it's fine locally. i'm specifying a file which i know i own. but when i'm testing to see if i'm on the same machine, i may not own the specified path from the terminal's perspective. i.e. i grab with O_EXCL /tmp/chomp on machine N. terminal is on machine M. i try to load /tmp/chomp to see if we're local, or just to load it, whatever. attacker can predict this (by whatever means), and grabs /tmp/chomp on machine M. the terminal is now reading from data someone else controls. beforehand, to inject it, they'd have to somehow write to the pty, which presumably my kitty process owns and has set writeable only by myself.

kovidgoyal commented 3 years ago

On Mon, Aug 30, 2021 at 12:30:11AM -0700, nick black wrote:

the difference here is that another user can put arbitrary data somewhere they own and the terminal will go seek it out

Files are protected by user permissions as well. So if a process running as another user was to do this, it would need to: a) either guess a filename in advance or somehow induce the application to create a file b) have permissions to write to that file

And this applies to SHM as well

kovidgoyal commented 3 years ago

On Mon, Aug 30, 2021 at 12:33:49AM -0700, nick black wrote:

i.e. it's fine locally. i'm specifying a file which i know i own. but when i'm testing to see if i'm on the same machine, i may not own the specified path from the terminal's perspective. i.e. i grab with O_EXCL /tmp/chomp on machine N. terminal is on machine M. i try to load /tmp/chomp to see if we're local, or just to load it, whatever. attacker can predict this (by whatever means), and grabs /tmp/chomp on machine M. the terminal is now reading from data someone else controls. beforehand, to inject it, they'd have to somehow write to the pty, which presumably my kitty process owns and has set writeable only by myself.

So a process on machine M somehow has to know that an unrelated process on machine N is going to request the terminal emulator to read from some particular filename. And then, the terminal emulator has to have some kind of buffer overflow? while simply reading data from that file (query image data is never actually rendered/processed in any way other than being read/decoded).

At this point I think we need tin foil hats.

WSLUser commented 3 years ago

and sending a file descriptor over AF_UNIX, which who knows if there's a Windows equivalent

https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

Bare minimum implemented but it's there. Not sure what all uses it. I'm aware of one, possibly 2 applications that are making use of it, one of them being Docker (if on a Windows build that supports it).