fnuecke / oc2

RISC-V VMs in Minecraft.
https://www.curseforge.com/minecraft/mc-mods/oc2
Other
613 stars 73 forks source link

Optimize devices lua and python lib #215

Closed ToMe25 closed 1 year ago

ToMe25 commented 1 year ago

This Pull Request significantly improves the message reading speed of the devices libs. All benchmarks were done with the default import.lua script, and my python recreation of it, which can be found here.

This does increase the error rate of the Lua version somewhat, however after a lot of debuging this did not seem to be caused by parts of the code changed in this PR, but by Lua and cjson. Details about this below, and later in a dedicated issue.

Changes

Speed improvements

For these benchmarks I used an import script, which was not modified between tests. To allow benchmarking of file importing I changed FileImportExportCardItemDevice.java to always read the same file instead of asking the user. I generated test files of varying sizes for testing using dd if=/dev/urandom of=import.txt bs=100K count=1. Measurements are taken using the time command. Since I ran every test more than one, I wont try the ones expected to take significantly over 2 minutes.

Benchmark Results: File Size Lua before Lua after Python before Python after
5kB 8s 100-200ms 2s 100ms 1m 1s 12s 850-900ms
10kB 14s 250-350ms 2s 600ms 1m 56.5s 18s 500-600ms
20kB 27s 3s 650ms - 29s 400-550ms
50kB 1m 4.5-5s 6s 700ms - 1m 2.5s
100kB 2m 6.7s 11s 800ms - 1m 58s

Issues

Rarely cjson.decode seems to replace random numbers in the data array with zeros. Causing random bytes in the imported file to be replaced by NUL bytes. This is not an issue caused by this PR, this already happens with the current version. I will open a separate issue about this. However somehow this PR seems to make that more common. I have no idea why/how this PR manages to make this more common, but it seems to do so.

That said, I have only tested it with the main version 10 times with a relatively small 20kB file, due to how slow it is. So it might have just been luck that it didn't happen as often then. I was unable to find any patter what-so-ever for the replaced numbers. And importing the same file more than once causes different numbers(bytes of the input file) to be replaced each time.

Also somehow sometimes, significantly more rarely than the above issue, random bytes in the message string get replaces by NUL bytes with this PR. This seems to not happen without this PR, however I have no idea why this happens. They aren't at block* borders either, so my code should not even touch those bytes at all. * By block I mean the 1024 byte blocks that are read from the device.

This rarely fuses two numbers together by replacing the comma, causing the writing to crash. This does not even happen every 20th time when importing a 100kB file tho.

I am certain both of these issues are caused by the Lua part of the importing system, because they do not occur with my python copy of the import script.

Motivation

I originally started working on this PR with the intention of making a python alternative for import.lua, because I expected that to be faster. To be exact I assumed the python impl to be slower at startup, and faster at runtime. I expected this due to Lua being fully interpreted afaik, and micropython compiling the script to bytecode at startup.

I was just really annoyed with how slow file imports were, so I expected that to be worth it. I also shortly considered making native import program, however neither including a pre compiled binary, nor automatically cross-compiling it seemed to fit this project, so I decided not to do that.

After seeing how much changing message reading from byte-by-byte to string based improved the performance of the python script I couldn't help but do the same for the Lua version.

After that the Lua version reached acceptable speeds, while my python version was still unacceptably slow, which is why I didn't end up including it.

The only reason I am a bit conflicted is because my python version is perfectly reliable(as far as I can tell), while I was unable to get the Lua version to that point.

Options

  1. If the better reliability of the python version is considered worth the huge performace drawback, please let me know and I will add it to this PR.
  2. If the slight reliability decrease of the Lua version is considered unacceptable, please let me know, in that case I will remove those changes.
fnuecke commented 1 year ago

LGTM, thanks! I'm worried that the issues with random broken bytes is a bug in sedna -- there's also an as yet unresolved issue with TCC randomly segfaulting in sedna (while it doesn't in Qemu using the same rootfs and kernel). I've not yet been successful in figuring out why that happens. I'll give this a test-spin soon.

fnuecke commented 1 year ago

It looks like the readOne methods are no longer used in both, am I missing something, or could they be removed?

ToMe25 commented 1 year ago

They could be removed, I just thought not having breaking API changes would be better.

I'm not at home for the next week, so if you want them removed soon, please feel free to remove them yourself :) Otherwise just let me know if they should be removed and I will remove them.

fnuecke commented 1 year ago

Since they're "private" I wouldn't really consider them part of the API, I'll remove them then :)

Thanks!