catalyst / moodle-local_datacleaner

Reduce, filter, and anonymize moodle data for non-prod environments
https://moodle.org/plugins/local_datacleaner
19 stars 17 forks source link

Add setting to remove only big files to cleaner_sitedata #61

Open abias opened 6 years ago

abias commented 6 years ago

Hi,

This feature request is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

We would like to use local_datacleaner to remove only big files (i.e. which are more than a configurable number of MB in size) in one of our washing box instances.

On /admin/settings.php?section=cleaner_sitedata, we can on the one hand select certain file types to be replaced. On the other hand, Moodle knows about the size of stored files in mdl_files. So there is the data and the mechanism to realize this hopefully quite quickly.

Thanks, Alex

brendanheywood commented 6 years ago

Yup this seems like a good thing. Note that we completely workaround this issue by using the objectfs plugin so that the test env's point to the production filesystem with readonly credentials, and when they write files they only write deltas to the local file system. The objectfs can already be configured to only move large files off to object storage, or it can push everything, or somewhere in the middle.

This effectively makes the cloning and washing of the filesystem an instant noop - we don't need to provision any disc on the washing machine etc

abias commented 6 years ago

Note that we completely workaround this issue by using the objectfs plugin

Thanks for this hint. Looking at https://moodle.org/plugins/tool_objectfs, I understand that moodledata should be located at Amazon S3 or Azure Blob Storage, right? Can tool_objectfs also be used with moodledata located on plain old NFS?

As an alternative to this feature request, we are currently thinking about leveraging ZFS snapshots on NFS for cloning Moodledata to the washing box instances. But this is something which has to be done at our local infrastructure and would not work with 1 small setting in the local_datacleaner plugin.

brendanheywood commented 6 years ago

Yes but if it's just nfs then it just looks like normal disc to moodle so it doesn't know the difference. So you don't need objectfs in order to use nfs. It's important to make a clear distinction between 'moodledata' vs 'filedir' as they can be treated very differently. moodledata is mutable, but filedir is immutable. Everything is stored under a hash of the contents in the latter, so filedir is perfect for moving to object storage, while the remainder of moodledata needs to be disc shared between the frontends like nfs / gluster / etc

I'm not really across the deep dark filesystem stuff at the os level like zfs, but I know various people have tried having readonly snapshot of the prod filesystem with local writes deltas which in theory gives you a fast safe clones for washing etc - but I've never seen it working flawlessly. If you can get something like this working at the fs level then the datacleaner doesn't need to be aware of it at all.