Open selfhoster1312 opened 5 months ago
md5 is slow...
Honestly I very much doubt that md5 hashing is the issue ... just look at the call to yunohost settings get
etc from the bash hooks ...
Added benchmarks from an actual Raspberry pi running the two versions (Python/Rust).
I think it's a bit strange that yunohost hook list regen_conf
take 6s on a rpi, i read the code and there is just one os.listdir
it should not take this time...
How did you made your comparison and measure ? Did you run yunohost hook list regen_conf
directly ? On my side i consider the moulinette framework could take more time than this part of the code... If you compare with a simple rust code that run just this part, it's not a good comparison.
Could you publish here your poc in rust ?
I am like @alexAubin, there are no md5 sum in yunohost hook list regen_conf
so i guess it's not the issue at least for this command...
And finally, i consider this ticket as an enhancement, not really a bug, but i guess changing from md5 to another way to detect file change could easily triggers some bugs (due to migrations and legacy stuff).
I think it's a bit strange that yunohost hook list regen_conf take 6s on a rpi, i read the code and there is just one os.listdir it should not take this time...
It's not just a listdir, it's more like 2, with a lot of dynamic code evaluation at runtime. A simple listdir would probably be a few orders of magnitude faster, but i haven't benchmarked that.
If you compare with a simple rust code that run just this part, it's not a good comparison.
That's what i did and it's not a fair comparison, but it is a good comparison. It's not fair because the code does not exactly behave in the same way. It is good because the code does exactly the same thing and moulinette's slowness is only due to using Python dynamic typing shenanigans, so it's a good illustration of just how slow most python code is.
I don't have the raspberry pi with me right now but i could make a benchmark of just that hook_list function if you're curious.
Could you publish here your poc in rust ?
You may find it right here https://github.com/selfhoster1312/yunohost-rs
there are no md5 sum in yunohost hook list regen_conf so i guess it's not the issue at least for this command...
I don't have that data at hand but i did run a benchmark, and you are correct. Turns out md5 is plenty fast on a raspberry pi. Only for big files could i measure a significant difference (more than a few percents) by using only mtime attribute.
I placed a "bug" label because it's unbearably slow even on a really fast machine.
There's a ton we can do to make it faster... i've been digging into the source code and so far i've found:
for category, conf_files in _get_pending_conf(names).items(): conf_hashes = _get_conf_hashes(category)
I have yet to properly benchmark this on a raspberry pi, but on x86, a naive (not optimized) reimplementation in Rust gets:
I haven't reimplemented the rest of regenconf (yet), but there's not a lot of big features missing (syntax coloring, custom log sharing..) which i don't think is as important as running fast (and could be reimplemented if really needed). Two orders of magnitude of slowness (x100) is quite a big deal! :(