YunoHost / issues

General issue tracker for the YunoHost project
72 stars 8 forks source link

regenconf is super slow #2392

Open selfhoster1312 opened 5 months ago

selfhoster1312 commented 5 months ago

I placed a "bug" label because it's unbearably slow even on a really fast machine.

There's a ton we can do to make it faster... i've been digging into the source code and so far i've found:

I have yet to properly benchmark this on a raspberry pi, but on x86, a naive (not optimized) reimplementation in Rust gets:

Command Python Rust Python (raspi) Rust (raspi)
regen-conf --list-pending 0.265s 0.003s 11.555s 0.026s
hook list conf_regen 0.150s 0.002s 5.975s 0.025s

I haven't reimplemented the rest of regenconf (yet), but there's not a lot of big features missing (syntax coloring, custom log sharing..) which i don't think is as important as running fast (and could be reimplemented if really needed). Two orders of magnitude of slowness (x100) is quite a big deal! :(

alexAubin commented 5 months ago

md5 is slow...

Honestly I very much doubt that md5 hashing is the issue ... just look at the call to yunohost settings get etc from the bash hooks ...

selfhoster1312 commented 5 months ago

Added benchmarks from an actual Raspberry pi running the two versions (Python/Rust).

zamentur commented 1 week ago

I think it's a bit strange that yunohost hook list regen_conf take 6s on a rpi, i read the code and there is just one os.listdir it should not take this time...

How did you made your comparison and measure ? Did you run yunohost hook list regen_conf directly ? On my side i consider the moulinette framework could take more time than this part of the code... If you compare with a simple rust code that run just this part, it's not a good comparison.

Could you publish here your poc in rust ?

I am like @alexAubin, there are no md5 sum in yunohost hook list regen_conf so i guess it's not the issue at least for this command...

And finally, i consider this ticket as an enhancement, not really a bug, but i guess changing from md5 to another way to detect file change could easily triggers some bugs (due to migrations and legacy stuff).

selfhoster1312 commented 6 days ago

I think it's a bit strange that yunohost hook list regen_conf take 6s on a rpi, i read the code and there is just one os.listdir it should not take this time...

It's not just a listdir, it's more like 2, with a lot of dynamic code evaluation at runtime. A simple listdir would probably be a few orders of magnitude faster, but i haven't benchmarked that.

If you compare with a simple rust code that run just this part, it's not a good comparison.

That's what i did and it's not a fair comparison, but it is a good comparison. It's not fair because the code does not exactly behave in the same way. It is good because the code does exactly the same thing and moulinette's slowness is only due to using Python dynamic typing shenanigans, so it's a good illustration of just how slow most python code is.

I don't have the raspberry pi with me right now but i could make a benchmark of just that hook_list function if you're curious.

Could you publish here your poc in rust ?

You may find it right here https://github.com/selfhoster1312/yunohost-rs

there are no md5 sum in yunohost hook list regen_conf so i guess it's not the issue at least for this command...

I don't have that data at hand but i did run a benchmark, and you are correct. Turns out md5 is plenty fast on a raspberry pi. Only for big files could i measure a significant difference (more than a few percents) by using only mtime attribute.