Open a97marbr opened 4 years ago
The really interesting thing would be to know if we could move to have all files (user uploads and course material) in the database.
makes other requirement such as download all files complicated.
also complicates backups.
i would say under 64kb and never for student handins but for teaching material
I would say it simplifies backup ... since backup only needs to consider LenaSYS source and DB. And it would improve security, right?
As for making downloading files more complicated ... yes it might ... but with the added gain that we have better access control.
But as a first step I like the idea to see how small files would be handled. It might be that performance becomes an issue.
I have read the research article “To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem ” by Russell Sears, Catharine van Ingen and Jim Gray. In this article, they performed tests as ‘create, read, replace and delete’ on BLOB files and files on NTFS storage. They are using a SQL server 2005 beta 2 (32 bit) and the system is running on a Windows Server 2003 R2 Beta (32 bit).
There are many factors as fragmentation I do not have the best understanding over and some terms as ‘bulk load’ I don’t know if LenaSYS is using in the same way this study does.
But to summarize, the performance for BLOB is superior in throughput up to 256KB and it starts to shrink when it reaches 512KB, after this point using a file system to store data is more efficient overall.
So to answer the question: “What would be a practical upper limit on file sizes? ” and “How would this effect performance?”, 256KB is the preferred limit after that it may vary in performance.
Reference: https://www.microsoft.com/en-us/research/wp-content/uploads/2006/04/tr-2006-45.pdf @a97marbr @HGustavs
Interesting. Do you think any advances has been made with regards to BLOBs in the last 15 years?
The report describes MS SQL Server ... how well does it compare to MySQL and Postgres?
I do not think BLOB have done any extreme revolutions over the past 10 years. I think BLOBs have progressed at the same phase as everything else involving storage. From the papers I have read from 2008 there was only three sized of BLOBs tiny, medium and large for MySQL and as of now there is one new size called “BLOB” and it is 65,535 bytes so some visual changes have been made.
To answer the question how BLOBs compare to MySQL and Postgres I found an article from 2008 “A Comparative Benchmark of Large Objects in Relational Databases” by Sorin Stancu-Mara and Peter Baumann. In this article they confirm the first statement in the introduction by saying that BLOB of less than 256KB should be stored in a database.
The paper continues setting up four test environments System A, System B, MySQL, and PostgreSQL I put less focus on System A and System B and tried to focus on the comparison between MySQL and PostgreSQL. The tests are made by doing write and read operations on different BLOB sizes.
From the conclusion the best preforming environment is MySQL and this is with outlines as MySQL can show spikes in performance as noted in figure 13. Write access took around 700 ms then moved down to 0.27 ms. For bigger BLOB sizes PostgreSQl was the superior, but as I don’t recommend going over 256KB I don’t see this as a relevant factor.
Refrence: https://dl.acm.org/doi/pdf/10.1145/1451940.1451980
@a97marbr @HGustavs
Ok ... let's assume we select 256 as a cut off size. How would we migrate to a solution that uses this? Perhaps outline a migration tool that we can use to move current small files into the database.
A migrating tool sound good and all... But between reading articles I used my time trying to implement saving and loading a file(.txt/.jpg) as BLOBs. I successfully saved BLOBs on my local computer. But when I was trying to retrieve a blob and display its content, I was met with some problems that I could not solve. I think this is the hardest part because of how the system is structured and to migrate a solution that can read files from the filesystem and BLOB values in parallel is the challenge here.
As this is a “investigate issue” I think most of the questions where answered. If you think this should be added into LenaSYS as a feature it should be split into sub-issues that target an overall solution for handling BLOB data without disturbing the current workflow of the system.
@a97marbr
Yes ... that sounds good.
Outlining a migration tool starts by adding issues that needs to be addressed ... I would label them feature wish or similar ...
Someone should create a set of issues for implementing this now that it has been deemed a good idea.
Investigate if we can move small/medium files into our database
Could we store files under a certain file size (for example 10MB?) directly in the database as a BLOB and avoid the file system altogether?