catalyst / moodle-tool_objectfs

Object file storage system for Moodle
https://moodle.org/plugins/tool_objectfs
88 stars 72 forks source link

Performance: Uploading a file to Moodle shouldn't hit object storage during the upload process #524

Closed danmarsden closed 1 year ago

danmarsden commented 1 year ago

Basically just summarising the issue from our internal WR#393527 to allow for a public reason for the change I'm proposing...

Turning object storage on has a significant impact on file upload performance for us when dealing with SCORM packages - we don't use AWS but have an openstack instance which doesn't quickly return a 404 response when asking about a file that doesn't exist in the external storage.

When you upload a new SCORM package it extracts the files then calls add_file_from_string for each file in the zip. the Object storage class overrides this function so it can ask the external storage "do you have the file yet" and if not, adds a record to the tool_objectfs_objects to make sure it's dealt with. https://github.com/catalyst/moodle-tool_objectfs/blob/MOODLE_310_STABLE/classes/local/store/object_file_system.php#L882

The problem is that the "do you have a file yet" when the file doesn't exist in the external storage device (it's a new SCORM package) can be really slow on our end and what would usually take 1min to upload (when just touching local disk) blows out to something crazy like 2min 30secs just because of all the "do you have the file yet" calls against the external storage. Some scorm packages have thousands of small files - and in many cases those small files will be under the sizethreshold set by the objectfs plugin so will never end up in external storage anyway.

The reality is that this check isn't needed at all, because the objectfs plugin has a cron task that looks for new files that don't exist in the external storage and uploads them

Some local testing with our infrastructure showed the following results when uploading a SCORM package with lots of files: Without object storage and local disk: 0:09 minutes With object storage and local disk: 6:24 minutes

We could add a setting to turn this on/off, but I'm curious to hear from anyone testing with a large SCORM package on AWS as I expect that the change I'm proposing might improve performance there too.

(PR incoming)

brendanheywood commented 1 year ago

Merged this in, just noting that we had similar issues with h5p packages in the past (we worked around this in hvp plugin) and I suspect there might also be related perf gains with backup and restore to, so all round win

brendanheywood commented 1 year ago

Just discovering this is causing a regression, but it needs a lot of conditions to be met, see #554