Open fengb opened 5 years ago
This should be deferred until after GBC support since that will have a radically different layout for pixel values.
Edit: this change has been applied so we can optimize now!
LLVM gives us this output: https://godbolt.org/z/YXLf4D. Translated to Javascript:
function foo(r0) {
var r1 = 255;
var r2 = r0 & 992;
r1 = r1 & (r0 << 3);
r0 = r0 & 31744;
r1 = r1 | (r2 << 6);
r0 = r1 | (r0 << 9);
r0 = r0 | -16777216;
return r0;
}
function simplified(raw) {
var r0 = 31744 & raw;
var r1 = 255 & (raw << 3);
var r2 = 992 & raw;
return -16777216 | (r0 << 9) | r1 | (r2 << 6);
}
Applying the LLVM voodoo led to 10-20% performance boost: https://github.com/fengb/fundude/commit/f29bca4e9c3a9cc705d9dbe5a65dc77c62a8e8a6#diff-c2b923b8185dd06e822dec6c545dae85R100
The pixel translation is actually a pretty strong bottleneck — 33% of execution time.
Thoughts: