dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.61k stars 114 forks source link

xterm: switching from pixel to cell blitting in ncplayer results in slowdown #1481

Open dankamongmen opened 3 years ago

dankamongmen commented 3 years ago

Using sixel, in ncplayer we can start using some cell blitter and happily switch between them. Likewise, switching to pixel, or starting with pixel, proceeds quickly enough. Downswitching from pixel to cells, however, results in aggravating and unacceptable slowness. Perhaps leaving the sixel up results in slowdown? Unsure.

dankamongmen commented 3 years ago

Interestingly, this would seem to persist across application invocations. Do (using the alternate screen, i.e. no -k):

dankamongmen commented 3 years ago

this isn't as academic a concern as one might think: notcurses-demo gets hit by this when running its default itinerary, since view follows xray.

dankamongmen commented 3 years ago

This is very noticeable. I'd like to see it fixed for 2.2.5 if at all possible.

dankamongmen commented 3 years ago

Let's characterize it. Does it happen without the alternative screen? Does it last only for the duration of the running program? Does it require video, or does it happen just from blitting a single sixel?

dankamongmen commented 3 years ago

the slowdown is definitely persistent across processes, and running reset undoes it, which seems very promising.

dankamongmen commented 3 years ago
  41 rows (23px) 80 cols (11px) (51.25KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, little-endian 16B cells
  terminfo from ncurses 6.2.20201114
  avformat 58.35.100 avutil 56.35.101 swscale 5.6.100

330 renders, 83.59ms (98.82µs min, 253.31µs avg, 361.61µs max)
330 rasters, 20.82ms (15.96µs min, 63.08µs avg, 107.93µs max)
330 writes, 5.56s (30.89µs min, 16.86ms avg, 605.92ms max)
9.37MiB (21B min, 29.08KiB avg, 115.83KiB max)
0 failed renders, 0 failed writes, 0 refreshes
RGB emits:elides: def 0:0 fg 254452:25645 bg 272848:7249
Cell emits:elides: 280097/1884703 (87.06%) 0.00% 9.16% 2.59%

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│   eagle│   9.62s│    329│   9.37Mi│   34.2│ 0│ 0│57│  58.05║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
               9.62s│    329│   9.37Mi│
[killermike](0) $ ./notcurses-demo -p ../data/ ie

 notcurses 2.2.4 by nick black et al on vte-256color
  41 rows (23px) 80 cols (11px) (51.25KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, little-endian 16B cells
  terminfo from ncurses 6.2.20201114
  avformat 58.35.100 avutil 56.35.101 swscale 5.6.100

617 renders, 152.93ms (96.73µs min, 247.85µs avg, 453.72µs max)
617 rasters, 40.53ms (15.88µs min, 65.68µs avg, 140.14µs max)
617 writes, 18.91s (15.14µs min, 30.65ms avg, 1.91s max)
13.02MiB (22B min, 21.62KiB avg, 517.63KiB max)
0 failed renders, 0 failed writes, 0 refreshes
RGB emits:elides: def 0:0 fg 285679:95542 bg 304899:76322
Cell emits:elides: 381221/3666187 (90.58%) 0.00% 25.06% 20.02%

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│   intro│   4.37s│    287│   3.59Mi│   65.6│ 1│ 0│31│ 191.78║
 2│   eagle│  21.59s│    329│   9.43Mi│   15.2│ 0│ 0│81│  18.68║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
              25.96s│    616│  13.02Mi│
[killermike](0) $

so even running intro (with bitmaps enabled) is enough to kill performance, and that lasts across further invocations until we reset.

dankamongmen commented 3 years ago

running ncplayer -bpixel -k to do an inline emission does not result in slowdown. running without -k to emit to the alternative screen does result in shutdown, and more shutdown the more times it's done, up to a ceiling. this definitely points to sixels staying in terminal memory and...something happening, i'm not sure what.

dankamongmen commented 3 years ago

i've written this up and submitted it as an XTerm bug

I've found that following emission of a Sixel graphic when using                             
the alternate screen (via smcup), further work in the alternate                              
screen is slowed down roughly in proportion to the total amount                              
of Sixel graphics written since the last terminal reset. This                                
persists even if the alternate screen is left with rmcup. There                              
is a ceiling on the slowdown.                                                                

A reset restores the terminal to its original speed. I can detect                            
no slowdown when outside the alternate screen, even if Sixels are                            
emitted outside the alternate screen.                                                        

This seems to happen even when TrueType fonts are not used,                                  
though it is much less pronounced in that case (possibly only                                
because there are fewer total pixels in my configuration when                                
TrueType fonts are disabled).                                                                

My programs are `ncplayer` and `notcurses-demo` from the tip of                              
Notcurses master. The former is invoked with the `-bpixel` command                           
line argument. When run on a single-frame image with this argument,                          
it scales the input to the terminal size, displays it as a Sixel                             
(having emitted "CSI ? 80 ; 8452h" beforehand), and waits for a                              
keypress. When run with `-snone`, no scaling takes place.                                    

If I run `reset` prior to `notcurses-demo -p ../data eee`, each run                          
takes between 9 and 10s:                                                                     

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║                             
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣                             
 1│   eagle│   9.68s│    329│   9.50Mi│   34.0│ 0│ 0│57│  58.00║                             
 2│   eagle│   9.94s│    329│   9.37Mi│   33.1│ 0│ 0│58│  55.24║                             
 3│   eagle│   9.52s│    329│   9.34Mi│   34.6│ 0│ 0│56│  59.57║                             
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝                             
              29.14s│    987│  28.21Mi│                                                      

If I display a single full-screen Sixel beforehand, it runs                                  
between 13 and 14s:                                                                          

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║                             
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣                             
 1│   eagle│  13.72s│    329│   9.46Mi│   24.0│ 0│ 0│69│  34.23║                             
 2│   eagle│  13.61s│    329│   9.41Mi│   24.2│ 0│ 0│69│  34.44║                             
 3│   eagle│  13.23s│    329│   9.40Mi│   24.9│ 0│ 0│69│  35.47║                             
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝                             
              40.56s│    987│  28.28Mi│                                                      

If I display two full-screen Sixels beforehand, in different                                 
processes (and across returns to the normal screen), the time                                
almost doubles from the reset case, requiring eighteen seconds:                              

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║                             
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣                             
 1│   eagle│  18.27s│    329│   9.49Mi│   18.0│ 0│ 0│77│  23.04║                             
 2│   eagle│  19.21s│    328│   9.45Mi│   17.1│ 0│ 0│79│  21.46║                             
 3│   eagle│  19.10s│    329│   9.43Mi│   17.2│ 0│ 0│78│  21.71║                             
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝                             
              56.58s│    986│  28.37Mi│                                                      

With enough cumulative Sixels (none remaining on-screen), the                                
terminal becomes effectively unable to update text:                                          

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║                             
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣                             
 1│   eagle│  32.60s│    328│   9.47Mi│   10.1│ 0│ 0│87│  11.44║                             
 2│   eagle│  39.41s│    329│   9.44Mi│    8.3│ 0│ 0│89│   9.27║                             
 3│   eagle│  37.39s│    329│   9.45Mi│    8.8│ 0│ 0│89│   9.82║                             
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝                             
             109.40s│    986│  28.36Mi│                                                      

The small increases in total output (less than a percent) are                                
due to more per-time-interval output events, and can be safely                               
disregarded.                                                                                 

I'm going to look into this further by looking at differential                               
profiles of XTerm. Please let me know anything else you need, or                             
if this is due to a mistake I'm making. Thanks!                                              

XTerm version: 366 as packaged in Debian Unstable 366-1                                      
TERM: xterm-256colors    geom: 80x41  truetype                                               
Xorg: 1.20.10 (DU 2:1.20.10-3) using nvidia 460.67                                           
Kernel: Linux, custom 5.11.11                                                                
Xresources:                                                                                  
  URxvt*termName: rxvt-256color                                                              
  URxvt*scrollBar: false                                                                     
  URxvt*scrollBar_right: false                                                               
  URxvt*scrollBar_floating: false                                                            
  URxvt*background: Black                                                                    
  URxvt*foreground: White                                                                    

  xterm*background: black                                                                    
  xterm*foreground: lightgray                                                                
  xterm*bidi.enabled: 0                                                                      
  XTerm*faceNameDoublesize: WenQuanYi WenQuanYi Bitmap Song                                  
  XTerm*faceName: Hack                                                                       
  XTerm*faceSize: 12                                                                         
  XTerm*renderFont: true                                                                     
  !xterm*font: *-fixed-*-*-*-18-*                                                            
  xterm*directColor: true                                                                    
  XTerm*decGraphicsID: 340                                        
         XTerm*decTerminalID: 420                                                                   
  XTerm*numColorRegisters: 256                                                               
  XTerm*sixelScrolling: true                                                                 
  XTerm*maxGraphicSize: 1980x1440                                                            
  XTerm*utf8: 1                                                                              
  XTerm*eightBitInput: false                                                                 
  XTerm*allowWindowOps: False                                                                
  XTerm*disallowedWindowOps:                                                                 
1,2,3,4,5,6,7,8,9,11,13,18,19,20,21,GetSelection,SetSelection,SetWinLines,SetXprop           
  XTerm*minBufSize: 65536                                                                    
  XTerm*maxBufSize: 1048576                                    
dankamongmen commented 3 years ago

see also #1740