dodona-edu / dodona

🧑‍💻 Learn to code for secondary and higher education
https://dodona.be
MIT License
68 stars 22 forks source link

Speedup seeding #4901

Open jorg-vr opened 1 year ago

jorg-vr commented 1 year ago

This is one of the main reason for slow tests on github actions

waarschijnlijk kunnen we dat voor een groot deel versnellen door de truuk op https://railsnotes.xyz/blog/seed-your-database-with-the-faker-gem#fixing-our-slow-seeds-with-upsert_all-and-activerecord-import

Slack Message

bmesuere commented 1 year ago

Note that in addition to seeding, it could also be used in the application itself. For example, when creating an evaluation we do a lot of inserts which can maybe be done as a single one.

jorg-vr commented 1 year ago

Speeding up using bulk inserts is a lot less simple than the example given, which is just a bunch of inserts with Faker data

I tried to profile the seeding script using stackprof to find our causes of slowdown: 26% of our time is taken by gitable functions (eg repository cloning) This is more file system related. We could ask ourselfs whether we need a 'large activity repo' in the seed

21% of time is taken by creating activity statuses A lot of that time is also spend in validations. This could potentially be rewritten in a single query, but it'll be rather complex to get correct

Next we get creating most courses (13%) and visualisation test (11%) A significant part of this is creating series, series memberships, course memberships etc. But as we loop over these to create submissions, a lot of the speed up of a collective insert all is lost when we have to query all afterwards. Creating submissions might be a good candidate for a collective insert, but these are also rather complex objects (We also have to fix the code and result file written to the filesystem) But avoiding some of the callbacks here could provide a speedup (some callbacks I tracked from submission create add up to at least 6.5% of total runtime)

I tried replacing student creation with one insert_all and one User.where(permissions: :student) call and it caused a slowdown instead of a speedup