Closed dankamongmen closed 3 years ago
we know we can print over a sixel graphic, but how we do that in the current rendering system is not at all obvious, since the whole thing is based around bringing down a single egc. hrmmm. that's gonna be...tough. erp.
for instance, right now you must supply -q
if not using -k
with ncplayer -b pixel
, or else the frame count results in it not being printed at all, due to not being selected in paint()
. hrmmmm hrmmm hrmmm hrmmmm.
how the hell do you erase one of these things?
i think a good acceptance criteria for this bug would be, say, ncneofetch
can use the NCBLIT_PIXEL
support when available.
for quantizing, unless we can find code to do it in oiio/ffmpeg, we could just try building a table, and if we overflow, make a nonlinear or even linear one, and restart the scan with closest match. that succccccccccccccks, but it would work for now.
resolved the extreme vertical space issue by restricting ncvisual_options->rows
to 1 for NCBLIT_PIXEL
in ncdirect
, yay!
I've implemented a simple, stupid quantization algorithm that at least lets us load things. We can now properly display everything I throw at it, including video =] =] =].
(make no mistake--we still want a better quantizer than this baby solution)
https://user-images.githubusercontent.com/143473/109427008-5f9ffe00-79be-11eb-94f2-298463ba09ad.mp4
slow as shit, and the quantizer sucks, but we're way past where we were last night =]
here's the current quantization vs adding back half the mask
top is run through gimp reduction to 255 colors with normal floyd-steinberg dithering, allowing a maximum mask of 0xff. bottom is our current solution.
scaling now works in rendered mode
scaling now works in direct mode
ncneofetch
I think we're at a point where it's worth committing to some optimization. If we speed up what we have, it's pretty much ready for release.
I bet these three steps will substantially speed things up. A further Tip: try to loosen on the actual squarespace result rather than the rgb source, for a more even distribution.
I've thus far been able to get a ~70% reduction by overlapping color and data table extraction, not bad at all. It's currently got a bug, and I'm going to go to sleep rather than try to force this tonight. Tomorrow i expect to pick this up, figure out what's going on along the top here, and reap the big perf get.
i'd like to get down to about 5% max of the original running time. this is a good start.
Hi, I'm the author of foot, a sixel capable Wayland terminal emulator. I'm really looking forward to this!
Is there anything I can do to help? Testing things work in foot is obviously a given; but is there anything in particular I should be running? Anything I should look out for, focus more on etc?
I haven't (yet) used notcurses in any applications, but really liking the tech :)
(If you are interested in running notcurses in foot, I'd recommend running the git master branch)
Hi, I'm the author of foot, a sixel capable Wayland terminal emulator. I'm really looking forward to this!
awesome, i will add it to my terminal matrix!
Is there anything I can do to help? Testing things work in foot is obviously a given; but is there anything in particular I should be running? Anything I should look out for, focus more on etc?
don't even look at the sixel stuff just yet; it's going to be a great deal faster in a few hours =]. ummm you can run different images through it, but rendering a sixel image is trivial; what's difficult is integrating it with a dynamic TUI, which involes writing a bunch of TUI code =]. umm just bang on it! if you'd like to test both your terminal and notcurses, write a script around xdotool
that resizes a bunch. or submit code =]. but yeah, any hammering you can do is appreciated!
one thing i was thinking earlier today was that it might be interesting to build a shell around notcurses. i don't see any real applications within a terminal emulator itself for it, though.
but just commenting alone made me feel good, and me feeling good means code gets written =D. so thanks!
(If you are interested in running notcurses in foot, I'd recommend running the git master branch)
i will definitely check it out, if i get my wayland configuration working on my nvidia box again! which i ought do.
w00t w00t.
before: real 0m1.377s real 0m1.372s real 0m1.373s real 0m1.369s real 0m1.380s
after: real 0m0.311s real 0m0.340s real 0m0.339s real 0m0.343s real 0m0.346s
that's about a 75% cut by uniting the first two passes
moved to binary search in find_colors()
:
before: real 0m0.320s real 0m0.327s real 0m0.309s real 0m0.315s real 0m0.345s
after: real 0m0.216s real 0m0.194s real 0m0.183s real 0m0.210s real 0m0.185s
that's about a 40% further decrease, or about 85% down from the original =].
with this second iteration of optimization, write_sixel_data()
is now taking more time than extract_ctable()
, so time to work on that for a bit, methinks.
I did do a couple of quick tests before your optimization work, and the difference compared to then is already very noticable..
I've looked through the sixel code (very quickly, mind you), and just a couple of comments:
\EPq
and the terminating \E\\
) gets its own set of color registers.\EPq
); very few terminals support 8-bit control sequences (CC1) as they can cause decoding problems in UTF-8 mode, depending on how the terminal implements decoding + VT parsing. Using the 7-bit equivalents (CC0) is encouraged.Also seeing this (the black area before, and under the mech), with ./ncplayer -b pixel ~/src/notcurses/data/warmech.bmp -k -q -s none
(in foot)
This is slightly different from XTerm, though there are some weird glitches there too.
I haven't looked into why this is happening. Do you know why the mech ends up all the way to the right?
I'd like to add that the Windows console and Windows Terminal (which uses the open source version of the console as the backend for VT) support both the C1 and C0 controls. However discussions there from experts suggests that 7 bits is actually the norm. The comments in this WIndows Terminal PR (and the relevant references including the issues associated and the xterm docs) should help understand more about the support for C0 and C1 controls: https://github.com/microsoft/terminal/pull/7340 (Note, you could probably still support 8 bits if you wanted and do something similar as was done in that PR to default to 7 bits first and then use 8 bits if the 7 bit support isn't there for some reason ex. you're using some legacy DEC terminal, emulated or otherwise, but still want to run notcurses with sixel support on it)
I did do a couple of quick tests before your optimization work, and the difference compared to then is already very noticable..
I've looked through the sixel code (very quickly, mind you), and just a couple of comments:
- I see you use private mode 80 (cursor positioning), which I think is appropriate here. Have you looked into private mode 1070? It switches between shared and private color tables. The default is private, where each sixel (i.e. everything between the
\EPq
and the terminating\E\\
) gets its own set of color registers.- Saw your note about 8-bit control sequences (regarding
\EPq
); very few terminals support 8-bit control sequences (CC1) as they can cause decoding problems in UTF-8 mode, depending on how the terminal implements decoding + VT parsing. Using the 7-bit equivalents (CC0) is encouraged.Also seeing this (the black area before, and under the mech), with
./ncplayer -b pixel ~/src/notcurses/data/warmech.bmp -k -q -s none
(in foot)This is slightly different from XTerm, though there are some weird glitches there too.
I haven't looked into why this is happening. Do you know why the mech ends up all the way to the right?
great feedback; i will respond at length later. for now, the mech being on the far right is intended behavior with -k
. if you use some other blitter (you can just leave off -b
) you ought see the same behavior, ala:
I see you use private mode 80 (cursor positioning), which I think is appropriate here
Sorry, was mixing up private mode 80 and 8452... 80 is for scrolling.
You might want to take a look at private mode 8452 too; when enabled, the cursor is left to the right, on the same line as the sixel, instead of on the next line. This is assuming mode 80 is enabled (the default, by the way).
or now, the mech being on the far right is intended behavior with -k
Ah, ok. I assume the black area isn't intended? XTerm looked better, but still not "clean". Should I spend some time looking at the escapes your emitting and see if there's something that needs to be changed?
optimization notes:
sixel_blit_inner()
: the actual output is only about 1/4 of the total time of this function, but it can likely be completely eliminated iff we can cut out the libc *printf()
ery entirely. i replaced the memstream with a huge buffer for experimentation, and wrote directly into it except for %d
. the remaining stdio just for the %d
s still took about the entirety of that 1/4. take a look at Alexandrescu's uint64_to_ascii()
from a few years back (we can't just reuse the method of term_esc_rgb()
--direct index into a pregenerated table of stringified integers--because we might have arbitrarily much RLE). i suspect that anything done here will end up without proportional returns.memcmp()
. i would have thought memcmp()
would have been inlined away given the known comparison size (3 bytes), but i guess not? quite possible we could handroll the comparison (hell, if we align it, it's two 32-bit values (the unused byte would be set to 0 to avoid a masking requirement)) to force the issue. if we're seriously taking a function call for those comparisons...yuck. probably ought align each ctable entry at 8 bytes: zero byte, red, green, blue, four bytes for dtable index (keeping this fixed at the current 2 means we can't support more than 256 color registers). then do a four-byte aligned sub for compare, requiring at most a single mask of the other 4 bytes. this might be a win. memcmp()
is roughly 20% of our cycles; memmove()
barely shows up; the extra moveweight may well be worth aligned comparisons...covid19.jpg
all failed attempts added up would be on the order of the successful attempt (mask of 0xe0 iirc). so on one hand, cut them to 0 time (by e.g. starting with a loose mask), and cut half the ctable construction time. on the other hand, all you cut is half the ctable construction time, and you lose color detail. ooooh what if we started with a high mask, locked in the used color range, and then backfilled the lower bits (and unused entries)? that's a very exciting idea!!our sixel rendering remains pretty much flat-out unacceptable for rendering even double-digit FPS, which i consider personally embarrassing. here's our current lbr
profile (./ncplayer -b pixel ../data/fm6.mkv -q
, 80x37 xterm 1124x1055):
- 99.62% 0.00% ncplayer libnotcurses.so.2.2.2 [.] ffmpeg_stream
+ 35.79% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit▒
+ 18.73% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;memcmp@plt
+ 10.33% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;sixel_blit_inner
+ 10.02% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sws_scale;0x7f644742337e;0x7f6447435da1;0x7f6447411890
+ 4.15% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098;0xffffffff9cc00ac0
+ 1.49% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;sixel_blit_inner;fprintf;__vfprintf_internal
+ 1.44% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098
+ 1.38% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098;0xffffffff9cc00ade;0xffffffff9ca6e251
+ 1.36% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098;0xffffffff9cc00ade;0xffffffff9ca6dcc7;0xffffffff9c07836f;
+ 0.74% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sws_getCachedContext;sws_init_context;0x7f64473e0fb5
+ 0.64% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098;0xffffffff9cc00ade;0xffffffff9ca6dcc7;0xffffffff9c07836f;
+ 0.62% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;cfree@GLIBC_2.2.5;munmap_chunk;0xffffffff9cc0008c;0xffffffff9ca6a1e3;0xf
+ 0.62% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;0x7f6449272098;0xffffffff9cc01382;0xffffffff9ca6b031
+ 0.52% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;sixel_blit_inner;write_rle.isra.0
+ 0.51% ffmpeg_stream;ncvisual_render;ncvisual_render_pixels;ncvisual_blit;ffmpeg_blit;sixel_blit;sixel_blit_inner;fprintf;__vfprintf_internal;_IO_default_xsputn
Sorry, was mixing up private mode 80 and 8452... 80 is for scrolling.
which i haven't even grokked the semantics of yet. i'm taking notes as i build this up at https://nick-black.com/dankwiki/index.php?title=Sixel, since there seems to be no authoritative study anywhere else.
You might want to take a look at private mode 8452 too; when enabled, the cursor is left to the right, on the same line as the sixel, instead of on the next line. This is assuming mode 80 is enabled (the default, by the way).
i think this is irrelevant to us, since we do a hard cursor positioning operation after any sprixel emission. though, we will most often admittedly want to be immediately to the (top) right of the sixel (not sure if you mean bottom or top right), so that could potentially save us a cup
. with that said, cup
is cheap (relative to e.g. RGB escapes), and i'm not terribly confident in pixel-to-cell conversions, and thus i might continue to cup
unconditionally to ensure i'm always where i want to be.
but yes, this seems a saner mode of operation. if it can save me some cup
s, i'll use it, but i suspect that i'll continue cup
ping up, in which case this would just be dead byte baggage in the already grotesque sprixel control sequence.
Ah, ok. I assume the black area isn't intended? XTerm looked better, but still not "clean". Should I spend some time looking at the escapes your emitting and see if there's something that needs to be changed?
i'm not sure precisely what you mean here -- i probably need reread your earlier comment. my head is full of optimization thoughts at the moment (also dayjob thoughts, not to mention standard dankhaze), and i was not yet going to open up that can of worms =]. with that said, you clearly know what you're doing, and i fully encourage you to study my code and output, and suggest or even implement fixes, and appreciate all help and wisdom you might provide =].
It's not so much about saving some cursor escapes, but more to avoid unwanted terminal content scrolling when the sixel is at the bottom of the screen.
With ?80h
(which you're already doing, and should be doing), terminal content scrolls when the sixel continues past the screen edge. I.e. sixels behave like any other terminal content. With 8452, the cursor is left just outside the sixel, in its bottom right corner. Think of sixels as being printed on a line printer.
Without 8452, it's as if you hit enter after printing the sixel - the cursor ends up in column 0 on the next line, potentially scrolling all output.
Btw, nice performance work! Keeping my fingers crossed and hoping for something that outperforms libsixel... :)
Also seeing this (the black area before, and under the mech),
Turns out the black area is caused by ncplayer
setting the background color to (0,0,0)
(i.e. black) during its initialization(?)
When the prompt is at the bottom of the screen, emitting the sixel requires scrolling the terminal. Scrolled in lines are erased with the current background color (assuming the terminal has the bce
terminfo capability) - hence the black area.
XTerm is doing this as well, but for some reason, the color it's using when scrolling a sixel is different. I haven't figured out why yet.
This isn't really specific to the sixel/pixel blitter. But the effect is probably hidden by the fact that other blitters then change, and most importantly, reset, the background color while blitting.
I tried playing a video file (with ncplayer). Mostly just for kicks. It's pretty slow (see attached screen recording).
One thing I noticed was that foot uses much more CPU when decoding the ncplayer output, compared to mpv's --vo=sixel
output. With the same window size, foot uses ~10-20% CPU when decoding mpv's output, and ~60-70% when decoding ncplayer's.
A high level comparison of the encoded sixel data shows that both mpv and ncplayer configures roughly the same number of color registers for each frame, but ncplayer emits longer sixel sequences. Looking at the DCS strings sent to the terminal, ncplayer's output isn't that much longer that mpv's. But, after RLE expansion (i.e. the !
encoded sixel characters), the number of sixel characters emitted from ncplayer is often more than 10x compared to mpv (for same-sized frames).
A high level comparison of the encoded sixel data shows that both mpv and ncplayer configures roughly the same number of color registers for each frame, but ncplayer emits longer sixel sequences. Looking at the DCS strings sent to the terminal, ncplayer's output isn't that much longer that mpv's. But, after RLE expansion (i.e. the
!
encoded sixel characters), the number of sixel characters emitted from ncplayer is often more than 10x compared to mpv (for same-sized frames).
fascinating! ok let me run some mpv -vo sixel
and do the differential analysis and get to the bottom of this. great find! if i'm emitting more sixels, am i not emitting a larger image?
lol, mpv -vo sixel
just assert()
s out for me, hrmm:
[schwarzgerat](0) $ mpv -vo sixel ../data/fm6.mkv
(+) Video --vid=1 (*) (vp9 640x480 29.970fps)
File tags:
Title: output
[autoconvert] Converting yuv420p -> rgb24
mpv: ../video/zimg.c:242: repack_entrypoint: Assertion `!(i & (mp_repack_get_align_y(r->repack) - 1))' failed.
Aborted (core dumped)
[schwarzgerat](134) $
lol,
mpv -vo sixel
justassert()
s out for me.
i was able to get it working on my Arch laptop. jeezum crow, yeah, this whips our ass. alright, i've broken out #1391 to address this. at one time, mpv -vo tct
was better than ncplayer
, too :D.
I'm closing this issue up; I've expanded it into several other issues to address particular problems.
With the completion of #1378, our Sixel encoder seems to be working properly. It's now time to better integrate pixel graphics with the render and raster pipelines, so it's useful in Notcurses as something other than a novelty.
cup
after emitting pixel graphics -- i think we're already doing this for rendered modewidth
field correctly (we currently always set it to 1)...though i'm not sure how useful this is,with regards to scaling, currently if i run
./ncplayer -b pixel ../data/worldmap.png -k -q
in a 78x80
xterm
, i get no output. if i run instead./ncplayer -b pixel ../data/worldmap.png -k -q -s none
i get about two empty pages, with the image in the middle, with part of the right chopped off.
stretch
,none
, et. al. ought work forpixel
like they do anything else, and this excess vertical space has got to go. i think the latter comes from creation of a n/6-row plane for n pixels going into the sixel.