AnubisLMS / Anubis

Distributed LMS for automating Computing Science Courses From NYU
https://about.anubis-lms.io
MIT License
291 stars 46 forks source link

OPT Speed Up Seeding #286

Closed PIG208 closed 2 years ago

PIG208 commented 2 years ago

Every time we run the tests for the API server, we re-populate the database using api/anubis/rpc/seed.py. This takes around 100s on my machine (Intel i7-10750H), which is kind of slow. Apparently writing a lot of data into the db is an I/O intensive task. The benchmark shows that we spend most of the time initializing the submissions. This is probably tolerable when we only want to make a local deployment, but it will be quite annoying if we expand the test cases. (Another motive for this is to make tests that potentially rely on a fresh database viable.)

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
4560    0.189    0.000   46.381    0.010 submissions.py:308(init_submission)
...
5       0.078    0.016   92.659   18.532 seed.py:202(init_submissions)

Unless we can make seeding optional (or partially optional) and only reinitialize part of it that is necessary for the test, we need to think of a way to speed up this script (namely the init_submissions function). We can

wabscale commented 2 years ago

That is interesting that it takes so long for you. It only takes my machine about 5-7 seconds to run (though I have a overclocked desktop processor).

I think we should not completely abandon formatting the seed in sqlalchemy as it is much easier to update the current seed functions if we change something in the schema. What you describe where we run a seed, then generate a sql dump sounds very good to me. We can then commit the seed sql dump to the repo so new people do not need to generate them. I'm thinking we could even gzip the sql dumps to get them even smaller. The seed endpoints can just run the sql dump.

Do you want to take this on, or should I?

PIG208 commented 2 years ago

I can work on this this weekend