flux-framework / flux-accounting

bank/accounting interface for the Flux resource manager
GNU Lesser General Public License v3.0
3 stars 10 forks source link

flux account commands hang while fairshare is being updated #451

Open ryanday36 opened 4 months ago

ryanday36 commented 4 months ago

I've noticed that, on corona, flux account commands hang when the fairshare tasks in the accounting cron tab are running. E.g. flux account view-user ... usually runs in less than a second, but takes ~2.5 minutes when run at the same time as the accounting cron tab. That's not terrible, so this probably isn't the highest priority, but it would be good if those updates could be done in a way that doesn't block queries.

I also don't know how much scaling testing has been done with the update-usage and related scripts, but I do worry that this could be a larger issue as we move Flux to larger systems with, potentially, many more jobs.

cmoussa1 commented 4 months ago

Thanks for pointing this out. Definitely something to look into further. I know that SQLite is pretty lightweight, and when there are a heavy number of concurrent reads and writes, especially for a fair-share update (which is a lot of writes), it could lead to low concurrency.

I wonder if there is potential to optimize the command that updates the job usage values for all of the associations in a flux-accounting DB. Right now, the command iterates through every row in the association_table and if it needs to make an update, it acquires a lock on the database to make that update. Repeat this process for a large number of users, and that's a lot of locks... I'll look into seeing if I can perhaps wrap all of the updates that would happen here into a single transaction that gets written to the database at once instead of for each row.