Closed lancekindle closed 6 years ago
First of all, sorry for not answering before. I saw the pull request, but I was really busy that week. By the time I was about to answer I had forgotten what I was going to say, and then I simply forgot to answer at all.
This optimization has a small problem, it relies on the C flag being set correctly after an "add hl, hl". It makes me feel a bit uneasy because there are some crappy emulators out there that have trouble with the 16-bit arithmetic additions. I'm not worried about PC emulators, but about emulators for other platforms that may be the only way to play this game on them...
That's totally cool, man. I closed my pull request because I had been wondering if it was worthwhile implementing. I imagine it would take some effort (such as writing tests) to verify it worked--and not just "oh hey, the game doesn't crash". I hadn't considered emulator errors; that in itself is reason enough.
P.S. your code is excellent! I am enjoying reading through--and seeing a lot of cool tricks in--your code. Thanks for making it available :)
Oh, ok. Thanks for taking the time to check the code and do it, though!
For future reference, this has been merged as explained in https://github.com/AntonioND/ucity/pull/4
I'm not sure if it's worth it (readability-wise), but for a minor speed boost you can set H=A & L=0 at the beginning of mul_u8u8u16, and use 'add hl,hl' to do both 'rla' and 'add hl,hl' at once. This shaves off a few cycles in the overall execution of the mul_u8u8u16 routine. It took me a while to wrap my head around this, but it helps to realize that 'add hl,bc' never overwrites any bits from A remaining in register H.