contour-terminal / terminal-good-image-protocol

*Good* Image Protocol - a formalization of a proposal for a new image protocol for virtual terminal emulators
33 stars 2 forks source link

image position/extend addressing mode and metrics #13

Open jerch opened 3 years ago

jerch commented 3 years ago

For placing and sizing an image there are different ways/metrics possible.

preliminary considerations

As prior art we only have the SIXEL way, plus the newer SIXEL implementations (somewhat broken in this regard).

The SIXEL way:

Newer SIXEL implementations ignore these settings and instead map pixels 1:1 to screen pixels. While this enables pixel perfect output, it makes autoscaling based on certain TE settings impossible.

To not end up in the same trap, we need to find a good metrics abstraction, that tries to respect:

In a TE we basically have these dimensions given:

means we have only 2 reliable metrics we can count on - a cell as base unit or the viewport extend in terms of number of cells.

Proposal 1: relative positioning and sizing to cell size

A naive way to deal with the limited entry metrics we have, is to derive all positioning and extend calculations only on discrete cells:

To also allow subcell addressing, this approach could be extended in a way to also address fractions of a cell. If done as float values, the size notions will behave pretty much like rem in CSS, with the difference of having distinct bases for width (1 = one text cell width) and height (1 = one text cell height). But not sure yet, if we really need subcell addressing at all.

Proposal 2: absolute positioning and pixel sizing in terms of a custom unit

Implement something similar to SIXEL's way with units and explicit pixel sizing:

Pros & Cons

There might be other proposals or variants of the two above, that make more sense in the end. Plz lets discuss the details (sorry for so much texting) :smile_cat:

j4james commented 2 years ago

For the record, the conclusion I reached on the subject of fitting images to cells (at least in the context of Sixel) was to just leave it up to the user to decide. If a user needs to emulate legacy software designed for the VT340, they can select a 10x20 cell size, and images will automatically scale to fit cells of that size (i.e. a 100x200 image will always occupy exactly 10x10 text cells, regardless of the actual font size). Similarly, for a VT382, they can select a 12x30 cell size. If they're using modern applications that query the cell size, and they want pixel-perfect images, then they can also just set it to "automatic", and the emulated cell size will match the real font size.

Once the user has that control, I don't see the need to allow apps to intervene. For example if I've configured a cell size of 10x20, and I'm happy with images being scaled to fit that size, I'm not sure I want applications overriding that and sending me high resolution images that chew up my bandwidth, just because my real font size is actually something like 18x36. As long as the terminal is reporting the emulated cell size in the CSI 16 t queries (and similarly scaling the response for CSI 14 t queries), then the app shouldn't care. Maybe it's not pixel perfect, and the aspect ratio isn't exactly right, but that's my choice to make.

Also, from an implementation point of view, I didn't think it was practical to allow applications to change the cell size on the fly, because then you need to track the pixel scale associated with every image segment on the screen. And since images can overlap with transparency, you've potentially got weird overlapping shapes with each portion needing to be rendered with different scaling factors. Maybe that's feasible with the right architecture, but it sounded like a nightmare to me, and I really couldn't see a compelling need for it.

jerch commented 2 years ago

@j4james Thx for your thoughts. Hopefully I find more time to address ideas more in detail again once the monster PR got merged. If you are interested we also have some layering discussions started for a new image protocol in #11.

j4james commented 2 years ago

I've been avoiding the image protocol discussions, because I'm not really a fan of the idea. I know Sixel isn't perfect, but I find it's good enough for my needs. And in general I'm more interested in implementing existing protocols than inventing new ones. Not that I want to discourage you guys from going that route, but it's not something I want to get involved in.

I was just chiming in here because you mentioned it in that other thread, and I thought the experience I've been through recently with Sixel scaling was somewhat relevant to this topic. If nothing else, I want to stress how important it can be to actually attempt to implement the ideas you're proposing, because it often isn't as easy as you first imagine (that was certainly the case for me).

Ultimately the success of a new protocol like this will depend on how many people you can persuade to implement it. And many will already have existing image storage and rendering architectures with which that they'll need to integrate it. If it's not reasonably straightforward to get working, you may struggle to get it widely adopted.

jerch commented 2 years ago

@j4james We both know that we have somewhat different viewpoints on sixel as such. I second you in one point - it is good enough for many things ppl want to do with pixel graphics in terminals.

My main driving force for terminal graphics is the REPL idea - some cmdline repl wants to spit out a visualisation of data, so just cat some format the TE understands, and the user is happy. Thats a very cheap goal, and indeed sixel can do that just fine (even interactively, if the app cares enough, by diff overwriting with blending). No need for high color space, tricky alpha stuff - just give me some visualisation feedback to make sense of the data. The REPL case is the reason why I left my "SIXEL should have died 30ys ago" stance. Yes, its good enough for that. (Sidenote - other than SIXEL I think ReGIS really is dead, and should be replaced by some SVG-T thingy. :smile_cat:)

But the needs dont stop there - the "industry" moved on with better image formats, which creates demand. Thats the point where I think that the terminal world should evolve as well. The least what could be done is to establish some alternative data payload (read it as "allow PNG/JPG transports similiar to the SIXEL sequence"). But we got SIXEL wrong in the second place (mostly because of the lack of correct docs or access to test devices), so we somewhat have to re-iterate the basics again. Which involves nasty things like the pixel to col/row translation. In the meantime other sequences popped up, which have fundamental flaws regarding basic terminal interface principles. Ideally we manage to avoid those flaws (like pixel notions from sequence params). Furthermore reshaping things from ground zero gives us the chance to enhance/extend the capabilities, like the layering discussion tries to figure out, what might be wanted.

Ofc this process is very tedious, and there are no guarantees, that ppl will adopt things in the end. Regarding an actual implementation - most things discussed here so far are at least backed up on my side with playground implementations, I am pretty aware of what is doable and what should be avoided as anti pattern. But I am not an egomaniac, that tries to push his own ideas, without seeking for some consensus first. We have at least a small group of TE devs, that share most ideas, which gives at least me the impression, that things will go into the right direction.