dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.64k stars 115 forks source link

add an additional mixed-blitter? #1223

Closed joseluis closed 3 years ago

joseluis commented 3 years ago

This C project seems to have an optimized algorithm to choose the most appropriate symbols https://github.com/csdvrx/derasterize maybe we could use something like that?

Examples: lemur snake ![snake_original]( image

PS found out about it on the interesting sixel support thread for microsoft terminal

csdvrx commented 3 years ago

If you can wait a bit, I'm going to release a new version that's much faster. It was done to support videos with mpv in tmux-sixel

dankamongmen commented 3 years ago

Hey there, @csdvrx ! I've seen your work, and thought it good. It's nice to get all we terminal hackers 'round the fire.

I'll take closer look at this soon. @joseluis , does image quality beat us? Like your pretty snake, I move more quickly when threatened =].

dankamongmen commented 3 years ago

@joseluis , running derasterize.c and notcurses-view on sample/snake.jpg seems to generate almost indistinguishable images, except notcurses-view takes .074s, while derasterize.c takes .303s, about 4x the notcurses-view time.

[schwarzgerat](127) $ time  notcurses-view -t0 samples/snake.jpg
Term: 45x80 vte-256color (VTE with xterm 256-colors)

 notcurses 2.1.4 by nick black et al 
  45 rows 80 cols (56.25KiB) 16B cells 256 colors+RGB
  compiled with gcc-10.2.1 20201224, little-endian
  terminfo from ncurses 6.2.20201114
  avformat 58.45.100 avutil 56.51.100 swscale 5.7.100
1 render, 80.53µs total (80.53µs min, 80.53µs max, 80.53µs avg)
1 write, 15.64ms total (15.64ms min, 15.64ms max, 15.64ms avg)
132.05KiB total (132.05KiB min, 132.05KiB max, 132KiB avg)
0 failed renders, 0 failed writes, 0 refreshes
RGB emits:elides: def 4:37 fg 3543:55 bg 3511:50
Cell emits:elides: 3600/0 (0.00%) 90.24% 1.53% 1.40%
real 0m0.074s
user    0m0.033s
sys 0m0.008s
[schwarzgerat](0) $ 

vs

[schwarzgerat](0) $ time ./derasterize.c samples/snake.jpg > /dev/null
real    0m0.303s
user    0m0.318s
sys 0m0.014s
[schwarzgerat](0) $

2021-01-08-052321_1619x925_scrot

if derasterize.c looked better when you filed this bug, it's probably because we weren't yet using sextants by default. now, derasterize.c does use things like THREE EIGHTHS BLOCK, which we don't, and maybe ought consider. but i suspect that sexblitter is roughly equivalent. even where we can't use sexblitter (e.g. alacritty), quadblitter seems about as good:

2021-01-08-053018_1611x1439_scrot

so if you find a case where we're noticeably worse, i'm very interested, but for now i'm feeling like the fastest terminal cowboy in the west [blows smoke from guns]. =]

@csdvrx , please don't interpret this as a knock on your superb work! just proud of my own =].

joseluis commented 3 years ago

Athough sexblitter has been an improvement, it does have less resolution due to the lack of vertical and horizontal eight blocks, diagonal bloks, diagonal lines, crosses, etc. The racoon image is the one that better shows the extended shapes palette.

I'd still add this new mixed blitter at some point, once Charlotte releases the new version. No need to rush in anycase.

dankamongmen commented 3 years ago

I'll take a look at raccoon, thanks for the heads-up. @csdvrx , do you intend to expose your blitter as a library? It looks like it's only built as a binary at the moment.

it does have less resolution due to the lack of vertical and horizontal eight blocks well, sexblitter gives you 3x vertical and 2x horizontal. they don't seem to be using the 1/8 horizontal blocks, though i'm sure it'd be trivial to add them. a bigger concern of mine is how to deal with the mixed mode. they're doing, so far as i can tell with a quick look at adjudicate(), a comparison of how similar the glyphs are to the desired output. does this work at different scales? hrmmm, i guess it does, probably

....i guess i'm thinking: we have a number of Y rows and X columns to which we're going to output, right? so we then resize the input to be Y {height of blitter} and X {width of blitter}, trusting ffmpeg/oiio to do a resize way better than we will, and then have a multiple of our comparison geometries. there's no guarantee of that here. which maybe isn't a problem? i'm unsure, will ponder it. this is why, for instance, we don't have a superblitter that utilizes both quads and sextants -- if we're mapping 3x2 pixels to a single cell, there's no way we ever use an UPPER HALF BLOCK nor LOWER HALF BLOCK, because we'd need 1.5-pixel matches. but hrmmm, that might actually be a better solution -- imagine you have input pixels 0x000000, 0x888888, and 0xffffff. the best blit of that may well be 0x444444 + 0xbbbbbb using half blocks, rather than what you'd get right now (not better than 0x444444 + 0x444444 + 0xffffff). HRMM. yeah, i think i've been assuming some false constraints in my sexblitter. HRMMMM. hrm!

oh hey, it looks like some of this code is due my girl @jart. [wave!] (we worked together at Google NYC)

another worry is that derasterize approach doesn't seem easily composed with other planes, which it needn't worry about, but we do. see e.g. #1068 which would be severely exacerbated.

jart commented 3 years ago

I think @csdvrx has textmode supremacy here. Look at that lemur.

dankamongmen commented 3 years ago

2021-01-08-104108_1610x925_scrot

i personally think the undetailed area on the left looks better, color-wise, in y'alls', and mine is better with the shadow on the eye and less weird banding above the eye / bottom of the top rind of the ear. they're certainly close.

jart commented 3 years ago

If we judge better by how closely the terminal version resembles the original, then derasterize has the advantage.

image

What are you doing to the color? There appears to be significantly more green with notcurses. Are you applying the sRGB <-> Linear exp2.4 companding formula? What's your scaling algorithm?

For what it's worth, here's what printimage.com produces. You'll notice it creates lines of details around edges since magic kernel sharp (magikarp) does sharpening, which can help things look a little less washed out at this scale.

image

jart commented 3 years ago

Also here's some useful torture images.

maxwell100 yellow and blue make pink (nearly all monitors fail this test due to subpixel layout)
maxwell100

scaling uses gamma correction test:
gamma-1 0-or-2 2

die welle: scaling blurs frequencies outside nyquist limit test:
diewelle1920x1080

csdvrx commented 3 years ago

Oops I only see that now!

Hi everybody!! Sorry I'm late to the party!

@joseluis , running derasterize.c and notcurses-view on sample/snake.jpg seems to generate almost indistinguishable images, except notcurses-view takes .074s, while derasterize.c takes .303s, about 4x the notcurses-view time.

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

Challenge accepted!

so if you find a case where we're noticeably worse, i'm very interested, but for now i'm feeling like the fastest terminal cowboy in the west [blows smoke from guns]. =]

Hmm, so Google cow-boys are picking up fights? I say let's meet in 2 days under the city clock :)

@csdvrx , please don't interpret this as a knock on your superb work! just proud of my own =].

Well, give me a day or two, and our work will crush yours :)

I think @csdvrx has textmode supremacy here. Look at that lemur.

@jart yes, visually, I agree: we do have textmode supremacy with derasterize.

But also, we could do so much better!! I've had a few other ideas since last year- like focusing on specific details that need improvement, in separate passes. And how to do that.

First, I think we should also think about the low hanging fruits: for example a Hough transform pass to find circles suitable for replacement with . ⴰ ⸰ ° ᴼ o ᴑ ○ O O ◯ ⚪ which may be cheap and fast, and could give us a better idea of how to handle multipass.

Another simple thing: combining characters. Go to https://onlineunicodetools.com/add-combining-characters and use the derasterize chars on the left: █▄▀ ▐▌▝▙▗▛▖▜▘▟▞▚ ▔▁▁▂▃▄▅▆▇ █▉ ▊▋▌▍▎ ◢◣◥◤/╱\╲⋰⋱⤫xX╳⌿⍀ ⑊﹨◺◿◸◹øØ⍉ ⍁⍂━┉┃╋╹╺╻╸┏┛┓┗═⎻⎼__|¦

You will see that most of them don’t combine to anything helpful, at least with the font I’m using.... but some do!

And this depend on the font, so imagine a code generator that:

This way , interesting combinations could be automatically identified, on a per-font basis, at compile time.

Take for ex: █͞▄ : the first one is a dud, but the second one would contribute to a denser mapping along the Y axis, so we would keep it.

If we keep the current approach of "throw stuff on the wall and keep what sticks", this could be used for finding the combining characters that "work well" and could be part of our set: if we don't want to do a compile time optimization on a per-font basis, a statistical analysis could tell us which combining chars are the most likely to be supported by a font -- and thus worth including as their positive contribution for a denser mapping is more important than their negative contribution to the curse of dimensionalities.

But this is just a first step. We could be more clever and do something better, a sort of reverse memoization: the goal would be to improve accuracy while keeping speed mostly the same (or at least control and limit the slowness): after the first pass has selected which unicodes work best (ex: ▄) a second pass could try to improve the bad fits (say the bottom b=20% of the fitness scores) by sprinkling a little magic dust made of unicode combining characters - the ones we already know in advance that will play well with that unicode, so we save time and don't test everything and the kitchensync.

This would not be very daring, but at least it would guarantee a non-regression of the fit (only improvements!) for a time cost that could be grossly estimated in advance by the size of the set we want to try.

This gives me ideas: like, if it's applied to a video, the b=20% could be a function of the "time left" to the next frame given the frame rate - so if the frame is too complicated, don't bother, b=0%

But that's just a very specific usecase. Now think about the bigger picture (quite literally!): combining characters are generally thin, which means they could help us for the high frequencies (fine lines etc) where it's easy to see we are currently underperforming.

Do you know Zalgo?

I mean stuff like กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ (check https://www.reddit.com/r/Unicode/comments/jd2d04/_/ for inspiration)

Some people want to sanitize Zalgo, like https://stackoverflow.com/questions/10414864/whats-up-with-these-unicode-combining-characters-and-how-can-we-filter-them

I say let's use this wonderful possibility instead! Zalgo could be super helpful with fine details, like the lemur hair, that we currently don't do right. Imagine a band pass, isolating the zones of fine details, and applying a hefty dose of stacked up (and down, and sideway...) diacritics - that could give us the fine lines that are missing from the picture. And using the reverse memoization (like above, one accent is for vertical lines, the other for 45 degrees for a given width etc) no need to compute each combination to select the best: just use whatever computer vision technique you like to isolate fine lines, and add some Zalgo to approximate them the best we can, piecewise.

Maybe it's a lame idea and it will crash and burn, but at least we will know why it. And it could be fun :)

For what it's worth, here's what printimage.com produces. You'll notice it creates lines of details around edges since magic kernel sharp (magikarp)

Yeah I remember our discussions of the magic kernel :) It's cool you implemented it :)

Please consider backporting some stuff to derasterize, as it may then find its way into tmux!! (more on that below)

@csdvrx , do you intend to expose your blitter as a library? It looks like it's only built as a binary at the moment.

@dankamongmen no, but the license is super permissive (thanks to @jart!) so feel free to contribute that!

Personally, instead of doing a library, I have integrated derasterize into tmux. This gives a much improved version of tmux-sixel: the idea is to intercept and recode sixel images on the fly, with derasterize, for all the lame terminals (like, cough cough, a certain Microsoft Terminal) that don't support sixels, so people can still enjoy pictures in their terminals!

It's very early, but at the moment, it does great on test, and not just on static pictures: you can play a video to a sixel output, and on a terminal that doesn't support sixels, you get moving ascii art instead :)

I've used that to display the output of xorg-sixel into Microsoft Terminal (yeah I know I'm weird lolol)

It's taking longer than expected, because I want to release a full set of things, and use tmux-sixel to make a statement.

Long story short: its creation was inspired by https://bugzilla.gnome.org/show_bug.cgi?id=729204

What I read there was very concerning: some people think it's ok to strategize, play politics, and delay or block a feature that people want, and some good people have already submitted patches for, like sixels, just because adding that may reduce the chances of success of their "new" pet format - something that doesn't exist yet, and that won't exist for a while, and that's currently depriving people of sixels.

I think that's super lame and that they need an attitude adjustment, John Cena style.

So tmux-sixel was made to bring back the libvte maintainers to planet earth: They refuse to integrate patches? They dont want the VTE terminals to support sixels? Ok, I'll take that as a given, as I certainly can't fork it all and maintain patches.

But instead, I can give their users what they wants: graphics in the terminal, through a tool that they can't stop, because it makes any terminal "somehow support" sixels (or at least a derasterize version of said sixels :) even if they really REALLY don't want to. That's the deviousness of tmux-sixel :)

They think "adding sixel support is straight harmful"?

Challenge accepted! tmux-sixel will deliver harm to them :)

[EDIT: typos]

jart commented 3 years ago

Concerning performance, printvideo.com has some permissively licensed code worth borrowing. It can play hd opera videos in the terminal at 60fps.

turandot

It can also play 4k videos in the terminal from a single cpu in a single process without threads (no gpu) without breaking a sweat.

love4k

But the code that makes it possible isn't exactly simple:

dankamongmen commented 3 years ago

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

sorry, i wasn't trying to pick a fight at all, just saying i saw no reason to go replacing our code wholesale. as noted, it's impressive work!

jart commented 3 years ago

Notcurses and Derasterize are both good projects written by some of the smartest terminal hackers on earth. I think competition between friends is fun. Notcurses is already perfect just the way it is. @csdvrx has some interesting ideas for improvement. I am convinced the color discrepancy is worth further attention. Please take that offline with me if there's factors re: color that I'm not considering here. I am always yours and at your service.

dankamongmen commented 3 years ago

one question i have for you two ladies: you're embracing a wide diversity of glyphs, whereas i'm working strictly with box-drawing characters (halfblocks, quadblocks, sextants). my main motivations here are:

(1) simple, completely regular mapping and, (2) (more importantly) equivalence among fonts (indeed, several terminals draw the box drawing characters directly, rather than use the font, and indeed this is what i do for the linux console where we're unlikely to have them available in the font)

how are you modeling your glyphs, when you don't know what font is being used? or do you claim to know this? sure, there's only so many ways to draw say WHITE CIRCLE WITH LOWER LEFT QUADRANT, but there are a few ways, and while they're superficially similar, the small differences seem (to me) like they would eliminate much of the benefit of such fine glyph-matching. am i missing something?

dankamongmen commented 3 years ago

(also, @csdvrx , i'd be sad if you couldn't beat notcurses-view on speed -- even in that simple case, i've still got to drag around a bunch of code for implementing the full TUI thing -- multiplanes, efficient redraws, all that kind of garbage. ncdirect operates a bit faster (see ncls, for example), but yeah, no way am i going to beat a purpose-optimized single-pass image blitter. if you run perf on notcurses-view or ncls, about a third of the time for a typical image view is FFmpeg playing with itself anyway =].)

but i welcome the competitive pressure!

once you're competitive, anyway =] =] =].

hack on!

jart commented 3 years ago

With derasterize the modeling is based on an 8x8 bitboard. It's biased towards PragmataPro.

      /* ▁ *\
    251b▕┛▎box drawings heavy up and left
      \* ▔ */
    G(0,1,1,0,
      0,1,1,0,
      0,1,1,0,
      1,1,1,0,
      1,1,1,0,
      0,0,0,0,
      0,0,0,0,
      0,0,0,0),

The code is generic enough that the array could be shortened easily to only those characters in IBM CP-437. However the extreme extended set of characters is specifically what makes derasterize so cool. It takes it further than any other project. It's also the most computationally expensive, since all you need for IBM CP-437 is a 4x4 board.

The Linux framebuffer console is certainly missing a lot of characters. I wouldn't be too concerned about supporting that since (a) I'm not even sure how maintained it is anymore and (b) there's a relatively straightforward dma pixel api for it. I made an attempt with printvideo.com to support it not too long ago.

Windows Consolas is one font that I feel really ought to support more characters. With printvideo.com what I did was just have flags and keymappings for choosing between CP-437 and the extended UNICODE set. Since there really is no sane way to know, across terminals, what the underlying font is. For example, on MacOS Terminal.App, the user is generally required to go a step further and tune the kerning in its settings in order to get blocks to not have thin spaces between them!

csdvrx commented 3 years ago

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

sorry, i wasn't trying to pick a fight at all, just saying i saw no reason to go replacing our code wholesale. as noted, it's impressive work!

@dankamongmen it was a joke :) It's rare to find other people with my interests, so I was just trying to tickle your competitive spirit, to have good reasons for us all to release cool code!!

Notcurses and Derasterize are both good projects written by some of the smartest terminal hackers on earth. I think competition between friends is fun.

@jart that's totally the spirit of what I meant! Some linux terminal hackers give me the creeps, but here all I feel is good vibes!

one question i have for you two ladies: you're embracing a wide diversity of glyphs, whereas i'm working strictly with box-drawing characters (halfblocks, quadblocks, sextants). how are you modeling your glyphs, when you don't know what font is being used? or do you claim to know this?

@jart already explained:

With derasterize the modeling is based on an 8x8 bitboard. It's biased towards PragmataPro.

For derasterise latest version, it's currently a 8x16 bitboard (which caused a few issues with 128 bit stuff, but it was fun to learn about bitmasks and bitlogic!).

For tmux-sixels, it's both: it has a fallback to a lower res bitboard for characters where it doesn't matter (ex: half blocks)

sure, there's only so many ways to draw say WHITE CIRCLE WITH LOWER LEFT QUADRANT, but there are a few ways, and while they're superficially similar, the small differences seem (to me) like they would eliminate much of the benefit of such fine glyph-matching. am i missing something?

Consider what the differences will be: in the worst case, they will be off by a few pixels, but you will still have a circle. What matters more: to be off by a few pixels, or to lose a shape?

I postulate it won't matter, as the proportionality of the diameter of the circle will be kept inside the font (ex: o will always be about half of O). It even offers a good excuse to use less computationally intensive code: instead of trying each of your say 4 circles in each block, run 4 Hough transform at the picture level, one for each diameter. Use the results to optimally "align" the chunks (if most of your circles coverage area would fall on the border inbetween 2 blocks, for a one off picture, just introduce a margin). Then put the correct circle in the chunks where most of the circle coverage area falls.

(also, @csdvrx , i'd be sad if you couldn't beat notcurses-view on speed

I'll try! derasterize is my fun Christmas time project, mostly to learn stuff. It's not super optimized either. tmux-sixel will be different, meaner, a software with a purpose :)

but i welcome the competitive pressure!

This, yey!

once you're competitive, anyway =] =] =]. hack on!

Damn, with that I've got to release today, not tomorrow!!

csdvrx commented 3 years ago

However the extreme extended set of characters is specifically what makes derasterize so cool. It takes it further than any other project. It's also the most computationally expensive, since all you need for IBM CP-437 is a 4x4 board.

The latest version uses 60 glyphs and a 8x16 board. It got a few optimizations to remain usable so your cool math stuff is temporarily gone :(

I've stopped working on it to focus on tmux-sixel.

Still, I think it could beat notcurses. I just got to clean up and release. My public gloating will force me to act on that and stop procrastinating :)

@csdvrx has some interesting ideas for improvement.

I started discussing them offline with @hpjansson recently, who has similar interests to us and wrote the Chafa library. We're mostly talking about making a tool that takes a font, and dump glyphs (and combined glyphs), so we can optimize the font-variant part. You could benefit from that too, given what you said about fonts. Would you like to join in the discussion?

I am convinced the color discrepancy is worth further attention.

To me it looks like a math issue, possibly due to a rounding error when changing colorspace.

Please take that offline with me if there's factors re: color that I'm not considering here. I am always yours and at your service.

Thanks, likewise!!!

I really like what you did, if you need help on some part involving stats, just ask!

dankamongmen commented 3 years ago

i should look back into this now that 2.4.0 is out and see what all can be improved

cloud11665 commented 2 years ago

Hello and first of all i want to apologize for necroposting on this thread. I'm in the middle of rewriting the mpv tct backend and this thread looks like the perfect place to share my knowladge and ask some questions.

Currently, only the quadblitter is implemented and it's quite fast due to copious ammounts of optimizations on the rendering side, but that's not the thing im here for. I'm here, because i wanted to get some feedback on the rasterizer side of things.

I've came up with a fast and accurate solution for planar (every pixel combination supported), but color bound representations (quadrants and sextants): For starters, a implementation of the K-Means algorithm is O(n^k), but since there are only 2 colors per rune, k=2, meaning that we don't have to fuck with any randomized heuristics. The kmeans I'm using is modified a bit, because after finding the clusters, I compute the nearest neighbours and make them the color of the cluster, meaning that we don't get washed out colors due to linear interpolation. (I'm also working on a SSE4 implementation, but there are still some errors, and the speedup is negligible even for sextants)

Anyways, here is my version of the lemur: image

As for non-planar blitters, I was running with a 2x6 bit matrix per rune, supporting both quadrants and sextants and other glyphs. Since there are only 12 bits in the matrix, I have a 4096 element lut, that for masks defined in the runeset returned them, and for the ones that were not defined, it defaulted to nearest sextant. (I only have some "old" pictures, because the code is on my second computer and I still haven't pushed it) image Forgive the hideous artifacts, they are probably some off by 1 error in the rasterizer.

jart commented 2 years ago

For an apples to apples comparison, here's @cloud11665's renderer cropped:

mpv-2022

Here's @csdvrx's derasterize at that size, which in this case appears to be configured to use the STB scaling algorithm:

derasterize-2022

Here's @jart's printimage.com at that size, which is using the Gyrados (Generalized Magic Kernel) scaling algorithm:

printimage-2022

Here's @jart's printvideo.com at that size, which is the same as printimage but it uses the Magikarp (John Costella's Magic Kernel) scaling algorithm:

magikarp-2022

Yours is certainly one of the best we've seen so far, aside from derasterize. How fast have you made it so far? For instance, derasterize achieves the highest quality by being the slowest. On the other hand, the printvideo magikarp renderer is designed to play 60fps high-def opera videos in the terminal without the assistance of a gpu.

Those measurements are for scaling/blitting/printing. In other words, (1) how long it takes to decimate the image to the terminal's size, plus (2) how long it takes to turn its pixels into ANSI+UNICODE text, plus (3) how long it takes to write to /dev/null.

In any case, very exciting stuff. Thank you for telling us you're working on it. I'll look forward to reading the code when you feel comfortable sharing it.

cloud11665 commented 2 years ago

Thanks for the quick response @jart !

In terms of performance in the current implementation, everything is up to the user, but I get the best results with --profile=sw-fast (cpu only). You can try it out for yourself by compiling my fork of mpv and running it with --vo=tct --vo-tct-algo=quad --profile=sw-fast. As far as improvements go I'm currently focusing on the quantizer / blitter, because my encoder is already very fast. I have also tried to approach the problem of blitting from a different, more heuristic angle, and I think I have found some very important information that I wanted to share.

  1. Matrices of  size (k, 2k) are the optimal choice, because it makes the scaling constant for both axes.

  2. We can give up the 2x6 matrix kernels for smaller, faster (Ad.1) 2x4 kernels by introducing some nonuniformity.

matrix_kernels

2x4 kernels mean only 2^8 possible combinations, which is both small enough to be easily memoaizable (at 8 bytes per bitmask) and fine-tunable by hand. Sextants give us 64 glyphs and quadrants give us 16, meaning that even with those alone, we have a ~31% coverage of the whole matrix, which in and of itself is a great score that can be improved just by adding more glyphs. And just like last time, when a bitmask is not a sextant nor a quadrant, we approximate it to the nearest sextant.

Another benefit to 8 piece matrices is that they can fit in a avx2 256-bit int vector, but It is important to not get distracted with premature optimizations, because it may just be, that the scalar quantizer is fast enough.

PS. When compiling the fork, there is a possibility of failure, because it uses an old version of libplacebo. You just have to fetch the upstream, there won't be any merge conflicts.

hpjansson commented 2 years ago

Nice work. If you want another point of comparison, you may want to look at Chafa. It uses bigger matrices (8x8, and 16x8 for wide characters) and is intended to be very general. You can have hundreds of glyphs and still be fast enough for animation by filtering with the popcount builtin to get the hamming distances.

cloud11665 commented 2 years ago

@hpjansson Nice, very nice. For starters, thank you for commenting. Thanks to chafa, today i have learned, that vte codes are chainable.

While the idea of hamming distances using popcnt came to my mind, doesn't it require iterating over every glyph in the glyphset to find out which one has the lowest distance? Hdist also lacks spatial coherence (please correct me if I am worng) and that could pose some issues.

Another thing that may come off as controversial, is my personal dislike for non-rectengular shapes. While other glyphs help increase the fidelity, after just a couple extra glyphs, it starts to come off as libcaca 2.0 (when running with a low resolution), which i try to avoid at all cost.

While it looks like chafa is focusing on the single frame picture side of things, I am aiming for smoothest video playback experience, which makes sense, because mpv is a video player. I have also promised to make 720p (as in the number of character pixels) playback in terminal feasible. It's a pipe dream, but a pipe dream that i want work towards.

image People on the dev irc seem to dissagree... But I don't care :smiley:

hpjansson commented 2 years ago

@hpjansson Nice, very nice. For starters, thank you for commenting. Thanks to chafa, today i have learned, that vte codes are chainable.

While the idea of hamming distances using popcnt came to my mind, doesn't it require iterating over every glyph in the glyphset to find out which one has the lowest distance? Hdist also lacks spatial coherence (please correct me if I am worng) and that could pose some issues.

It does, but if you pack the glyph bitmaps as an array of u64 (in the 8x8 case), you can xor them all with the shape you're looking for using SIMD and calculate the popcounts pretty quickly. When I've benchmarked this, the receiving terminal has always been the limiting factor. Not sure what you mean by spatial coherence; you do get the right shapes this way, not just coverage amount. E.g. you can use it to pick the correct sextants in a sextant-only scenario.

Anyway, it's true that more specialized approaches such as your sextant one (or Notcurses' specialized blitters) are always going to be faster, and it looks pretty good too.

Another thing that may come off as controversial, is my personal dislike for non-rectengular shapes. While other glyphs help increase the fidelity, after just a couple extra glyphs, it starts to come off as libcaca 2.0 (when running with a low resolution), which i try to avoid at all cost.

It's a matter of taste and use case. I think at a minimum, the 1/8 block symbols improve the output, but I also want to be able to print different flavors of retro art and support less capable terminals/printers. E.g. old-school ASCII/Baudot code teletypes should work with the same pipeline by just throwing a few switches. Or how about a C64, or a ZX81? Obviously it's not for everyone, and in some cases it's stepping out of the realm of the practical, but it feels more complete this way. Maybe it doubles as a homage to the character art genre(s)...

While it looks like chafa is focusing on the single frame picture side of things, I am aiming for smoothest video playback experience, which makes sense, because mpv is a video player. I have also promised to make 720p (as in the number of character pixels) playback in terminal feasible. It's a pipe dream, but a pipe dream that i want work towards.

image People on the dev irc seem to dissagree... But I don't care smiley

Sounds good to me, and probably achievable with the right terminal. Happy hacking.

cloud11665 commented 2 years ago

By spatial coherence, I have meant this situation:

given an input of

input:        glyph a:        glyph b:
X             XX               
   X          X  X               X
  XX            XX              XX
 XXX           XXX            XXXX

a has a higher Hdist than b, but it's clear that a is a better choice, due to having more details.

An alternative approach would be the minimum sum of absolute differences (this heuristic is used in the PNG spec to determine the best postfilter for a given row) between the glyph and the input, but that would be VERY slow

edited to make it more of an edge case

hpjansson commented 2 years ago

Oh, I see. I deal with that by using the hamming distance to pick a set of candidates with low distance (size of set adjustable with a performance switch), and then deciding between those using sum of squared differences on RGB or DIN99 pixels.

cloud11665 commented 2 years ago

Damn, that's nice. Thanks for your insight!

Now I am left wondering what approach to choose... 2^8 is small enough to fit everything in the cache and a cache lookup is like 20 cycles, so I will stick with it, but now I know how to would handle bigger matrices.

cloud11665 commented 2 years ago

@hpjansson One more question. What color quantization algorithm are you using ?

hpjansson commented 2 years ago

Nothing special, just median cut with a shellsort. The sixel encoder has an interesting palette mapping approach, but for character cells it's much simpler, at least if you're using the universal 216-color cube + grays.

cloud11665 commented 2 years ago

Ok, I have implemented the special quad + sextant kernel, and here are the results:

without linear interpolation on a per-centroid basis: image

with lerp: image

Here is also the code for rounding to the nearest sextant

struct {char data[5]; uint16_t width;} charmap[256] = {
    [0b00000000] = {" ", 1},
    /* ... */
    [0b01010101] = {"▐", 3}, //U+2590  RIGHT HALF BLOCK
    [0b10100000] = {"▘", 3}, //U+2598  QUADRANT-1
    /* ... */
    [0b00001010] = {"▖", 3}, //U+2596  QUADRANT-4
    [0b10000000] = {"🬀", 4}, //U+1FB00 BLOCK SEXTANT-1
    /* ... */
    [0b01111111] = {"🬻", 4}, //U+1FB3B BLOCK SEXTANT-23456
};
for (int i=0; i<256; i++) {
    if (chrmap[i].width) continue;
    int newmask = i;
    if (i & 0b00101000) newmask |= 0b00101000;
    if (i & 0b00010100) newmask |= 0b00010100;
    chrmap[i] = chrmap[newmask];
}
cloud11665 commented 2 years ago

Now, the only thing left in terms of image quality is the scaling algorithm, because as far as I know, mpv uses Lanczos by default, which isn't the best in terms of reducing ringing artifacts. But for now, I'll experiment with the different downscalers (provided by libswscale) and their parameters.

Because the code is long due a refactor and some standardization is very much needed. After the refactor I'll port the current drawing optimizations to the new "algo" (mpv's way of saying blitter) and implement more additional optimizations that I have thought up during the last month.

csdvrx commented 2 years ago

(late to the party as usual)

It's very interesting direction, and totally worth the necroposting :) @cloud11665 I'll be eager to try your mpv when you're done!

Now, for the glyph difference in https://github.com/dankamongmen/notcurses/issues/1223#issuecomment-1105829392 I think you should look besides simple approaches, while still considering the computational cost: just fancy it up a little bit!

Looking at your example, glyph A is a better option because it keeps a piece of information: even if it's badly approximated, it's better than losing the detail: the lemur example is nice because you can quickly see that effect at play on the chin hairs: when they're gone, it's like having a band-pass on the picture as only the big detail remains. Please forgive this bad analogy, but hopefully it's good enough to drive the point that you want to keep the lemur chin hair!

As for how, I would try to keep using Hdist but with a twist: segmenting the glyph (here we have 3 regions) and comparing the number of segments to those of the candidate glyph (which can be precomputed to save time): here, glyph A has 3 regions, while glyph B only has 2.

This should act as a prefilter: outside pathological cases, no glyph with only 2 regions should be used to approximate something that has 3, at the risk of losing details such as the lemur chain hair.

This would also have the nice side benefit of limiting the input set: the best way to optimize is to avoid computing what you know won't be needed!

In a way, this would be akin to a bloom filter on the set of candidate glyphs, something you could easily improve besides the number of regions by adding other metrics, like border angularity (here 45 degrees) to exclude even more glyphs (say the square blocks that have vertical or horizontal borders).

If you use the angles, you may need to make that a set (as u2598 quadrant upper left has both a vertical and horizontal border), but you get the idea: it has 2 regions and horizontal+vertical borders, meaning it is going to be a poor candidate anyway, so you might as well exclude it from the set on which you'll run computations: it may not be a problem with a limited number of glyph, but it will scale non-linearly with the more glyph you add, since every glyph of the set needs to be compared to every chunk of the image!

This simple approach is not going to be perfect: but it might help with other things like emergent artifacts ("scaling blurs frequencies outside nyquist limit test" as reminded by @jart ) by preserving the difference as long as possible given the set of glyphs available - which you will be able to grow without impacting the computation time too much thanks to this bloom-filter like approach (and maybe even use composing characters?)