Closed dankamongmen closed 2 years ago
even more oddly, i only see 11 colors in chunli23.png
, fewer than we count in quantanal
. ahhh, perhaps becuase of the initial reduction from rgb to sixel space? we really oughtn't be doing that so early--it loses precision!
yep that's why
[0->10] hi: 18 6 0 lo: 18 6 0
[10->8] hi: 94 75 62 lo: 94 75 62
[10->8] hi: 94 87 69 lo: 94 75 62
[11->4] hi: 94 94 94 lo: 94 94 94
[12->12] hi: 18 40 72 lo: 18 40 72
[13->13] hi: 43 18 9 lo: 43 18 9
[14->14] hi: 94 87 69 lo: 94 87 69
[15->15] hi: 31 12 3 lo: 31 12 3
[1->9] hi: 0 21 53 lo: 0 21 53
[2->0] hi: 0 31 56 lo: 0 31 56
[2->0] hi: 18 40 72 lo: 0 31 56
[3->3] hi: 25 6 0 lo: 25 6 0
[3->3] hi: 31 12 3 lo: 25 6 0
[3->3] hi: 43 18 9 lo: 25 6 0
[4->11] hi: 43 28 9 lo: 43 28 9
[5->2] hi: 37 56 81 lo: 37 56 81
[6->5] hi: 56 28 15 lo: 56 28 15
[7->6] hi: 69 40 28 lo: 69 40 28
[8->1] hi: 59 72 90 lo: 59 72 90
[9->7] hi: 84 53 43 lo: 84 53 43
FOUND COLOR 0 0 25 50
FOUND COLOR 10 0 0 0
FOUND COLOR 11 25 25 0
FOUND COLOR 1 50 50 75
FOUND COLOR 2 25 50 75
FOUND COLOR 3 25 0 0
FOUND COLOR 4 75 75 75
FOUND COLOR 5 50 25 0
FOUND COLOR 6 50 25 25
FOUND COLOR 7 75 50 25
FOUND COLOR 8 75 75 50
FOUND COLOR 9 0 0 50
those deets entries look a bit askance, though maybe i'm forgetting something...
yeah these hi/lo values are fubar:
FOUND COLOR 11 25 25 0
FOUND COLOR 11 25 25 0
FOUND COLOR 11 25 25 0
FOUND COLOR 11 25 25 0
000 206 hi 18/40/72 lo 0/31/56
001 69 hi 59/72/90 lo 59/72/90
002 78 hi 37/56/81 lo 37/56/81
003 343 hi 43/18/9 lo 25/6/0
004 72 hi 94/94/94 lo 94/94/94
005 187 hi 56/28/15 lo 56/28/15
006 155 hi 69/40/28 lo 69/40/28
007 124 hi 84/53/43 lo 84/53/43
008 178 hi 94/87/69 lo 94/75/62
009 83 hi 0/21/53 lo 0/21/53
010 35 hi 18/6/0 lo 18/6/0
011 226 hi 43/28/9 lo 43/28/9
ok the 56 is floor(0x90100/255). so that's sixel space. likewise floor(0x50100/255) == 31...though how we're ending up with different hi vs lo remains confusing.
so is this all due to our preclumping, where we're applying that large mask?
FOUND COLOR 0 0 25 50 (orig 0xff905000)
000 4 hi 0/31/56 lo 0/31/56
FOUND COLOR 0 0 25 50 (orig 0xffb86830)
000 5 hi 18/40/72 lo 0/31/56
differences of 40, 24, and 48. we start with a mask of 0xc0, aka two bits, aka enclumpenings of 64, so yeah, that's wide enough to take them. still, if we're storing with full precision, we ought be able to get them out losslessly. so it looks like that's where our breakage is.
yeah we're always averaging in refine_color()
, stupid stupid stupid.
definite bug in extract_color_table()
, argh!
alright, i think i've found one trick that wins us back a good bit of quality, testing for performance regression...
yeah that seems to improve the visual a decent bit, though our average error changes very little...
alright, bold new idea for the relaxation algorithm: initially, we take a 323 MSB liftup from the channels. use that as a key into a 256-element table. now the full extraction is O(1) per pixel (currently O(lgn)), at the cost of a possibly sparse color table. but use that same sparsity as a topological guide to our relaxation, relying on some locality. if need be at the end, compress the table down -- this can be delayed until encoding time, and done on the fly. this ought be a very substantial improvement, i think.
I would really like to port your encoder sometime to Java for the vastly superior speed and quality over other sixel encoders. Do you have a planned milestone spot in the future that would be a good place to start from?
progress thus far indicates about a 50% cut in sixel quantization runtime, fuckin' sweet
I would really like to port your encoder sometime to Java for the vastly superior speed and quality over other sixel encoders. Do you have a planned milestone spot in the future that would be a good place to start from?
righjt now the encoder's no good =] i'm bringing it up to at least libsixel territory now. libsixel uses straight-up maxmedian cut (as you do, iirc), of which i'm not a tremendous fan. i think octrees are likely the way to go, though i've been playing around with kohonen neural networks. my deep dislike for statistical/ML methods means i probably won't go that way, either. right now i'm experimenting with static octree preclustering + my own "relaxation" approach, and it is showing very positive results. #1452 then suggests getting rid of the final iteration of relaxation, which i think is a great idea, but we'll see.
i'm down to a bit under 50% of my old rendering time with just the octree preclustering, but i've got a bug in there for images that are fewer than 256 colors. i should have it knocked out tonight, hopefully.
on the left is current algorithm, running in .9s±.2s when coming from page cache. on the right is the new algorithm, running in .5s±.1s when coming from page cache.
(of that .5s, about .35s is due to writing out to the terminal, and the terminal rendering, so we're talking about .4s vs .1s for actual quantization, so a bit more speedup than advertised)
ok, i think i just got that bug resolved. things are looking very good...
so we're seeing equal quality on most images now, while preserving nice speed pickups....except for notcursesIII.mov
, on which we seem noticeably slower and crappier.
likewise we're definitely slower and crappier on samoa.avi
. very strange that this correlates with video, though probably not informative. huh.
new algorithm:
[schwarzgerat](0) $ time ./ncplayer ../data/chunli* -t0 -q
real 0m7.650s
user 0m5.349s
sys 0m0.228s
[schwarzgerat](0) $
old:
[schwarzgerat](0) $ time ncplayer ../data/chunli* -t0 -q
real 0m10.964s
user 0m5.285s
sys 0m0.325s
[schwarzgerat](0) $
but then on worldmap.png
, old beats new (and looks better);
[schwarzgerat](0) $ time ncplayer ../data/worldmap.png -t0 -q
real 0m0.241s
user 0m0.180s
sys 0m0.012s
[schwarzgerat](0) $ time ncplayer ../data/worldmap.png -t0 -q
real 0m0.244s
user 0m0.175s
sys 0m0.020s
[schwarzgerat](0) $
new:
[schwarzgerat](0) $ time ./ncplayer ../data/worldmap.png -t0 -q
real 0m0.267s
user 0m0.197s
sys 0m0.012s
[schwarzgerat](0) $ time ./ncplayer ../data/worldmap.png -t0 -q
real 0m0.269s
user 0m0.203s
sys 0m0.008s
[schwarzgerat](0) $
new pastes old on AllRGB images:
[schwarzgerat](0) $ time ./ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m6.671s
user 0m5.923s
sys 0m0.392s
[schwarzgerat](0) $ time ./ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m6.681s
user 0m5.955s
sys 0m0.348s
[schwarzgerat](0) $ time ./ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m6.712s
user 0m5.945s
sys 0m0.379s
[schwarzgerat](0) $
old:
[schwarzgerat](0) $ time ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m7.575s
user 0m6.826s
sys 0m0.372s
[schwarzgerat](0) $ time ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m7.507s
user 0m6.821s
sys 0m0.391s
[schwarzgerat](0) $ time ncplayer /media/chungus/images/allrgb/* -t0 -q
real 0m7.502s
user 0m6.827s
sys 0m0.398s
[schwarzgerat](0) $
soooooooooo AllRGB is going to fill up the color table on the initial pass, causing the second pass to abort immediately (it's also going to be dominated by writing/processing time, as they're 64MB RGB). meanwhile, very few colors are going to likewise cause very little second phase, since most will get slotted out to different octets in the first phase. so that points to our first phase being good, and our second phase being ass. it doesn't explain why we look shittier in that case, though. maybe a remaining bug in the second phase?
well, we ought be able to cut a chunk off that ass with the #1452 proposal. let's apply that and see where we get. if that doesn't win in all cases, i think we move to an all-octree solution, and call it a day.
pure octree is, what, 2NlgN? you're doing a lgn insert for each of N, then a lgN lookup for each of N (N pixels). that's assuming you don't want arbitrary fidelity, though. if you do a relaxation following, you're talking what an additional PN (P maxcolors) worst case (for each free color pick a node, and expand that node). 9 levels get you full fidelity on 24-bit color (each level is 3 bits), so really it's 16N + P²lgP for relaxation-free pure-octree (keep a sum-over-subtree in each node, increment it as you traverse for insertion, and then pick your P by breadth-first-search with a sorted frontier, PlgP to sort the frontier....actually, no, at each point traverse down on the most populous...but then selecting the most populous requires searching a frontier; a frontier is implicit i think). BUT you avoid the N*P of relaxation, which is absolutely going to overwhelm P²lgP for all but the smallest images (this might not be true for 1024 colors, a case we have been sedulously ignoring).
if you do pure octree out to, say, 7 levels (losing 6 bits of detail--in a no-relaxation approach, this would be maybe 1 bit of red, 2 of green, 3 of blue) you need 256K nodes. since there's no relaxation, you needn't keep a hi/lo, and only need 3 words of color tracking per leaf node (one each for colorsums; we already have count). this is small enough to do a pointer-free implementation, i think -- each node is 256B with 64-bit words or 128B with 32-bit words (32 bit words guarantee overflow-free operation only though 2^24 summed pixels, aka 4096x4096, but that might be enough?). so you're talking 128K==2^17 nodes at 2^7==128 bytes + 128K==2^17 nodes at 2^5==32 bytes == 16MB + 4MB == 20MB for a static octree of 7 levels. a bit extravagant, not too wonderful on an embedded system, not great for on-die cache (there's a compulsory miss for the entirety due to initial zorching).
at 5 levels you lose 12 bits of detail, half your colordepth. that's still way better than preclustering on 256, where you lose 16 bits of detail, and pretty much have to apply relaxation. indeed, at 5 levels in the densest case you're throwing out 93.75% of your samples, as you lose 15/16 of your bits (you're keeping 256 of 4096). on an even split you're losing 4 bits per channel, i.e. you have clumps 16-wide on each channel. not so bad. might not even need relaxation most of the time on that. in that case you have 2K heavy nodes and 2K light nodes for a static 327,680 byte octree. now we're talkin'!
just for fun, let's look at 5 levels on 1024 registers. well now you're just throwing out all but 1024 of 4096, aka 3/4 of your solutions. you still clump out at 16 on your leaves, but you can take more leaves. space cost remains 320KB--this is independent of P (and N) in all cases. the lack of relaxation allows us to scale P (and N!) out arbitrarily.
our shared object is well over 100KB. if you're running Notcurses, i think you can afford to toss 320KB at quantization (it's not necessary to keep the octree beyond encoding time).
let's fuckin' do it you sons of bitches arrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
it would be a fine thing if we could get this 320KB all under the auspices of one HugeTLB entry
I think I was using bytes up there where I wanted bits (for the node sizes), so cut all that by a factor of 8, which is obviously good =]
how hard would it be to add a recombination pass? octree for initial structure, recombine following? in one dimension, recombination works like: we have 4-wide sets on [1..20]. 20 samples at 1, 10 at 4, 10 at 5, 20 at 15, and three registers. well we probably want to treat the 20 at 4,5 as a group, rather than the 30 in the first set. recombination looks across node boundaries. does this end up just being an optimization problem? in that case, fuck it
ehhh to do that effectively you've pretty much gotta keep deep structure within your nodes, I think. so aye fuck that.
relaxation is cheaper here than our initial implementation. though, since you can just deal it out from your frontier. you do then have a search at the end...actually, you always have one, since you need find your ancestor that made it into the table (you are guaranteed to only have one).
so let's lock this down: 1) heavy nodes are 16B, light nodes are 4B 2) we will have 2^11 each of light and heavy nodes in a pointer-free static structure, so 2^15+2^13, aka 40KB(!). might as well do 64 bits for 80KB. 3) N insertions into octree at (fixed) 6 steps each
now we either prune down or meld up. pruning down requires a frontier. you're looking for the most populous P nodes such that all N pixels are included and node in P also has an ancestor in P, right? because we must account for all N pixels, and want to combine (losing resolution) on as few as possible (though a distance*population metric is better if we're trying to minimize total error, I think). so pruning down means, starting at the root, if the node is populated, etc etc. best for sparse octrees, where you only need touch a sunset of the leaves. in a dense octree, though, this is worse than melding up, as you must touch all nodes rather than just leaves. it's also more complicated, so fuxk it. meld up.
to meld up, you have to touch all leaves, to find which ones are occupied. then you relax or fold until you have P.
what if we combined the two? we can track the number of populated leaves in a single word while inserting then, so o(1) state and negligible time assuming cache presence (which we can force by layout). as soon as you exceed some number of.populated leaves, meld up entirely. before that, DFS to find.your populated leaf set. or fuck it, throw another 2^12 bits (2^9 bytes, a mere 512B) as an occupancy bitmap...that's actually not a win assuming reasonably packed leaf nodes, as you can check the node pop in o(1) just as you can the bitmap, dumb.
either way you end up with a set of occupied leaf nodes partitioning N, with no guaranteed relation of this set's size to P. we need to be less than or equal to P, and ideally equal to P when we have at least P actual source colors.
so that's just relax and fold in the desired direction, using whatever metric we want. we can do this greediy; doing it optimally for the population metric sounds difficult and doing it optimally for a distance*population metric sounds right fucking unpleasant.
alright so to lock the second phase down:
first-order adjustment to this would, i think, be a metric that is the product of sum population and true (i.e. not logical) distance between nodes. this requires no intranode structure, but seeks to minimize global error (it also carries a hint of recombination as described earlier). there are a lot of multiplications, though. ehhh, we can afford some multiplies.
when you combine two nodes, you....hrmmm. that's actually kinda tricky, since we don't have pointers (and thus aliases), and it's not a binary tree (thus you can't simply take over the parent node). hrmmmmmmmm. we could just zorch out the lower of the two combined nodes, and always lean to the right when our true path is zorched? or with a state bit we could indicate the node we were combined into in the count word, and roll up a lgN chain to the final combined node. yeah, i think that latter's the way to do it.
at step 9, we have a set L where |L| <= P, and no node in L has an ancestor that's also in L, which was our goal for quantization. now we must look up each pixel's quantized color, which can be done in 2NlgN (for each pixel, descend to our leaf node, and possibly hop over to our merged node). we then have a quantized set of colors, and have mapped each pixel to one of them. Encode. Done.
qemfd let's do it
alright so the first detail is the actual machinations for this static octree.
operations we will need include:
...is that the only one? do we need ancestor given a node, or siblings? i think we do for melding, yeah. so...
level 0: light node at 0..0 x 8 level 1: 8x light nodes at 1..8 x 8 level 2: 64x light nodes at 9..72 x 8 level 3: 512x light nodes at 73..328 x 8 level 4: 4096x light nodes at 329..2376 x 8 level 5: 32768x heavy nodes at 2377..18760 x 16
so that's 4681 light nodes and 32768 heavy nodes, losing 9 (not 12) bits of depth. at 8 bytes for lights and 32 bytes for heavies, that's 37448 + 1048576 == 1086024B, a little over 1MB. not great, not terrible.
5 levels only means 585 lights and 4096 heavies, losing 12 bits of depth. that's 135752B, a little over 128KB.
given that at 5 levels we're already tossing 93.75%, i see no advantage in going deeper. let's make up some insertion keys:
level 0: n/a level 1: 1R, 1G, 1B (777 remain) level 2: 2R, 1G, 0B (567 remain) level 3: 1R, 1G, 1B (456 remain) level 4: 1R, 1G, 1B (345 remain)
what's the point of a non-uniform distribution here? it's only going to affect melding under a pop-distance metric if we weigh RGB differently in said metric (well, the distance part anyway).
so to define our operations:
let's code this up and see what happens.
fuck you can just do (A - base) / nodesize(L) to get a 12-bit index and then chop it up into 4 3-bit path sections, i'm fucking retarded
fuck you can just do (A - base) / nodesize(L) to get a 12-bit index and then chop it up into 4 3-bit path sections, i'm fucking retarded
for leaf nodes, anyway. your base would be address of octree + 4681 * 8.
for adddress of descendants you always come in knowing your L, base, and path thus far. so you just need say base + 2^3L nodesize(L) + path[L] nextoff(path[0..L-1]) where nextoff shifts current 3 left and adds path[i] through L-1.
one remaining question is when do we map RGB space [0..255] down to Sixel space [0.100]? i guess you want to carry precision as far forwards as it is useful?
you know...i don't think we even need the light nodes. we're not pruning down, but instead melding up, so there's no point in tracking the population at higher levels. this is basically just a 2^24 -> 2^11 masking.
yeah. =]
ok so true heavy nodes (the only node we have) will be 14B each. we'll have some power of 2 of them. we choppa and map. store hi and lo for each, and 32-bit pop for each (64-bit pop means 22B each, maybe do that). recombine/relax to P, using fast relaxation (no pixel moves). take the lower of all recombinations, since the eye distinguishes dark better than light. use the idea from above where we rewrite the popcount with the node that replaced it. finally, map down, following the popcount links.
imho this ties together everything best from relaxation, sweetmasks, and octrees.
one last thing: maybe write out the color table and use binary search, rather than searching the octree. it's the same NlgN either way, but the former has much better locality for cache.
alright, just got our first light off the new scheme, looking great thus far, i've got high hopes here
have notcurses-info
working once more under the new regime...
so here's what we've got in the octrees-for-victory
branch:
so it's looking positive. need to finish out merge_color_table()
at a minimum.
but yeah the original motivation for this bug, the divergence, is eliminated.
alright, we're now functional on images with more leaves than color registers, though it's a first-order merge and tends to look like shit. that ends functional necessities; i'm going to clean out the diagnostics and start some fine-tuning and numeric comparison. i expect to merge this tonight.
here's some crude notcurses-demo
data:
runtime│ frames│output(B)│ FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
1│ intro│ 4.14s│ 210│ 3.79Mi│ 50.7│ 6│ 1│47│ 90.00║
2│ xray│ 15.54s│ 217│ 14.91Mi│ 14.0│ 1│ 0│19│ 65.65║
3│ box│ 6.88s│ 100│ 9.88Mi│ 14.5│ 3│ 0│50│ 26.56║
4│ keller│ 13.72s│ 866│ 3.45Mi│ 63.1│ 7│ 2│ 7│ 356.05║
5│ view│ 22.05s│ 962│ 54.07Mi│ 43.6│ 3│ 1│37│ 102.58║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
62.34s│ 2355│ 86.11Mi│
runtime│ frames│output(B)│ FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
1│ intro│ 3.96s│ 321│ 6.71Mi│ 81.2│ 9│ 2│39│ 156.45║
2│ xray│ 15.48s│ 233│ 15.48Mi│ 15.1│ 1│ 0│16│ 85.47║
3│ box│ 6.57s│ 100│ 9.36Mi│ 15.2│ 5│ 0│46│ 29.05║
4│ keller│ 13.71s│ 878│ 3.46Mi│ 64.1│ 6│ 2│ 7│ 385.39║
5│ view│ 21.21s│ 963│ 54.08Mi│ 45.4│ 3│ 1│36│ 110.45║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
60.92s│ 2495│ 89.09Mi│
runtime│ frames│output(B)│ FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
1│ intro│ 4.18s│ 327│ 7.87Mi│ 78.3│ 8│ 2│42│ 145.51║
2│ xray│ 15.52s│ 207│ 14.80Mi│ 13.3│ 0│ 0│17│ 69.76║
3│ box│ 7.07s│ 100│ 12.06Mi│ 14.1│ 6│ 0│48│ 25.60║
4│ keller│ 13.63s│ 864│ 3.43Mi│ 63.4│ 7│ 2│ 5│ 407.31║
5│ view│ 39.62s│ 961│ 73.55Mi│ 24.3│ 1│ 0│18│ 114.95║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
80.01s│ 2459│ 111.70Mi│
runtime│ frames│output(B)│ FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
1│ intro│ 4.13s│ 317│ 7.71Mi│ 76.7│ 8│ 2│42│ 143.30║
2│ xray│ 15.53s│ 206│ 15.01Mi│ 13.3│ 0│ 0│18│ 68.25║
3│ box│ 6.63s│ 100│ 10.13Mi│ 15.1│ 6│ 0│45│ 28.70║
4│ keller│ 13.62s│ 864│ 3.43Mi│ 63.5│ 7│ 2│ 5│ 404.63║
5│ view│ 39.33s│ 963│ 73.55Mi│ 24.5│ 1│ 0│18│ 117.42║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
79.24s│ 2450│ 109.82Mi│
contour is kinda unstable right now, and i'm not smart enough to copy-and-paste from xwayland into chrome on x so we only have xterm numbers. but alright! the only one where we would expect to see big timing changes is view
, where indeed we kick it in the balls for a healthy 50% reduction in runtime. we'd expect some total frame count changes in xray
, where time is constant and we drop frames, and we indeed have about a 10% gain there.
alright, this is looking good, performance-wise! let's get some detailed info from quantsixel
.
(from the same runs as above)
Bitmap emits:elides: 1420:306 (17.73%) 66.70MiB (59.70%) SuM: 0 (0.00%) Bitmap emits:elides: 1376:299 (17.85%) 65.00MiB (59.18%) SuM: 0 (0.00%)
Bitmap emits:elides: 1444:304 (17.39%) 44.32MiB (49.74%) SuM: 0 (0.00%)
so it looks like we're emitting significantly smaller bitmaps, which surprises me. we'll want to look into that and determine why, i think. i mean, it's a good effect, but it needs to come from a righteous cause.
alright, so we see a slight increase in accuracy on chunli23.png
:
analyzing ../data/chunli23.png...
source pixels: 99x46 rendered: 99x46 18216B
control sequence: 3133 bytes (17.2%)
00004554 pixels analyzed Δr 319236 Δg 2799 Δb 2877
Colors: 17 vs 17
4554px Δr 319236 (70.1) Δg 2799 (0.615) Δb 2877 (0.632)
avg diff per pixel: 71.3
done with ../data/chunli23.png in 81.068ms.
analyzing ../data/chunli23.png...
source pixels: 99x46 rendered: 99x46 18216B
control sequence: 3116 bytes (17.11%)
00004554 pixels analyzed Δr 317209 Δg 1271 Δb 1626
Colors: 17 vs 17
4554px Δr 317209 (69.7) Δg 1271 (0.279) Δb 1626 (0.357)
avg diff per pixel: 70.3
done with ../data/chunli23.png in 81.758ms.
worldmap, as expected, looks worse (though it is slightly faster and smaller)
analyzing ../data/worldmap.png...
source pixels: 475x860 rendered: 475x860 1634000B
control sequence: 149102 bytes (9.125%)
00408500 pixels analyzed Δr 773700 Δg 747897 Δb 840216
Colors: 233 vs 54
408500px Δr 773700 (1.89) Δg 747897 (1.83) Δb 840216 (2.06)
avg diff per pixel: 5.78
done with ../data/worldmap.png in 500.359ms.
analyzing ../data/worldmap.png...
source pixels: 475x860 rendered: 475x860 1634000B
control sequence: 129291 bytes (7.913%)
00408500 pixels analyzed Δr 1352655 Δg 1352043 Δb 1154968
Colors: 233 vs 47
408500px Δr 1352655 (3.31) Δg 1352043 (3.31) Δb 1154968 (2.83)
avg diff per pixel: 9.45
done with ../data/worldmap.png in 439.945ms.
time to run ncplayer ../data/notcurses.png -q -t0
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m0.998s
user 0m0.481s
sys 0m0.021s
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m0.997s
user 0m0.497s
sys 0m0.008s
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m1.017s
user 0m0.494s
sys 0m0.012s
[schwarzgerat](0) $
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m0.834s
user 0m0.308s
sys 0m0.016s
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m0.801s
user 0m0.311s
sys 0m0.013s
[schwarzgerat](0) $ time ./ncplayer ../data/notcurses.png -q -t0
real 0m0.805s
user 0m0.311s
sys 0m0.012s
[schwarzgerat](0) $
both current and new seem to generate a busted ncvisual on intake from sixel in notcurses.png
using NCSCALE_STRETCH
, generating:
analyzing ../data/notcurses.png...
Image too large, scaling to display
source pixels: 320x1280 rendered: 1380x880 4857600B
control sequence: 805394 bytes (16.58%)
00281600 pixels analyzed Δr 28101341 Δg 15579849 Δb 1586032error getting pixel at 320/0 (1380/880)
error getting pixel at 320/0 (1380/880)
error comparing sixels
ncplayer /media/chungus/images/allrgb/* -t0 -q
:
real 0m29.891s
user 0m29.277s
sys 0m0.605s
real 0m37.361s
user 0m36.704s
sys 0m0.648s
so current beats new on these pathological images.
time to run quantanal
on everything in ../data
:
real 0m9.174s
user 0m8.779s
sys 0m0.392s
real 0m9.242s
user 0m8.815s
sys 0m0.423s
real 0m9.618s
user 0m9.134s
sys 0m0.477s
real 0m9.527s
user 0m9.171s
sys 0m0.353s
ok, i'm thinking i'm gonna merge this, and work to improve the two cases that have regressed (smooth gradients result in more banding due to eliminating relaxation; large numbers of leaves lead to slowness and quality droppage). the causes of both problems are understood, and this is a much better foundation to work from going forward.
the first problem will be easier to solve -- we just need to add fast relaxation.
on the latter, we need better merging, but that's not going to give us any perf back, and we already have a perf deficit. we'll have to science up that bitch.
i still don't understand why we're seeing such a high average diff on chunli23.png
. we ought be able to match those 17 colors exactly, right? how many colors are truly in the image? identify -format %k
says 17. in that case, if we have 17 colors (check this), they ought be pretty much exact, no?
0xff001030 -> 784
0xff001040 -> 1040
0xff082050 -> 1312
0xff183070 -> 1840
0xff184870 -> 1864
0xff284890 -> 2377
0xff4868b0 -> 2922
0xff7088d8 -> 3467
0xff883800 -> 60
0xff905000 -> 84
0xffa0c0f0 -> 4037
0xffb0e0f0 -> 4069
0xffb86830 -> 877
0xffd09060 -> 1686
0xffe8b898 -> 2495
0xfff0f0f0 -> 4087
with
quantanal
improved in #2502, we're now seeing divergence on a more complex, but still simple, image:we're for some reason using 10 more colors, and missing by a wide 71.5 per. adding
NOINTERPOLATION
makes things worse:meanwhile, on another, we drastically reduce the number of colors: