Closed vladimirg closed 8 years ago
Currently pushed local development branch Feature_Add-QuotaSystem. The feature seems to work fine, still need to check a bug that sometimes the cleaning stage is not done when running WG with parent, doesn't happen every time so far only when running multiple datasets in parallel (and even in these cases not all the times).
sum of changes:
these are the files that have been changed to be deleted in order to save space: in datasets
no need to delete anything in hapmaps since they don't weigh much.
One issue that popped up - when a dataset finishes, it doesn't update the usage stats. We could use Javascript to update that. While we're on it, we can also make deleting a dataset not refresh the page, so as to not interfere with any uploads, but still update the usage stats.
As the quota system (the first task) is live and well in production, this issue was closed and the two remaining tasks were split into separate issues - #54 and #55.
As we've already hit the storage limit on lovelace, we should implement a quota system per user so that we can continue to grow.
Currently, a lot of intermediate files are kept - FASTQs, BAMs, pileups, etc. They are large, and they may not be needed. The files that are needed, besides the output images and the dataset configuration (useful for debugging), are the final SNP/CNV results, which can later be used to specify the dataset as a parent or use it to construct a hapmap. However, we do want to keep the original input files (whatever they are) in case of an error. Since the analysis is deterministic, if an error occurs in production, it should be reproducible in other environments, so the original input is all that's needed.
Estimating that a clean dataset will weigh no more than a few hundreds of MBs, a quota of 25 GB per user can support at least 50 analyzed datasets. We can also add an option to download the dataset results, if the user requires more projects (or increase the quota for that particular user).
Tasks:
After this task is complete, we should do a one-time sweep of lovelace and remove old dataset input and intermediate files.