Closed fabianmuehlboeck closed 9 years ago
Problem: How can we manage the size of data stored in our DB so that a user won't end up using the service as a cloud storage service?
We should somehow control the size used by each user. First we need to decide which structure should we use to store data 2 options: 1.1. BLOB field in DB 1.2. File (filename will be stored in DB)
Performance Read: For BLOBs smaller than 100KB, reads are faster when the BLOBs are stored directly in the database file. For BLOBs larger than 100KB, reads from a separate file are faster. (https://www.sqlite.org/intern-v-extern-blob.html) TODO: Decide the quota each user is going to have Update: Updates are slow in BLOB. We should keep in mind that updates will be often
Querying the size BLOB: Simple as SELECT LENGTH(code) FROM usercodes WHERE usercodes.userid = ... NOTE: sqlite doesn't support stored procedures. Some of the queries might need to be done in back-end. TODO: Decide if the codes will be shared among the users - and its effect on quota,- will they be allowed to change the code, or just read, or custom settings? File: Size can directly be queried from command line.
SECURITY BLOB: Open to SQL Injection? TO BE CONTINUED
I don't know if cloud storage is a major concern since we wouldn't be providing a convenient way to upload and download large amounts of data. If it becomes a problem, we can just go back and retrofit some way of determining approximately how much data a user is storing.
As for how to store, one thing to consider is tables with columns "user id", "table name", "column id", "row id", "data". The idea is an entry in this table indicates the "data" of the entry at "row id" and "column id" of "user id"'s "table name" table. You could have one of these tables for each kind of data type.
After talking about it briefly with Fabian on Monday, this is a basic design that I came up with. Looking at the questions listed above, I thought that the data model resembles GitHub mainly due to the long-term project sharing and version control ideas, so I used that as the general direction.
I decided to start off by separating users and projects so that each user points to the projects that they are contributors of. Projects contain information about visibility, contributors, etc, that will be needed for display and management. Within each project, there are the pages that represent the pages that will be displayed. In order for each page to be fully functional though, there needs to be access to the backing code and assets.
For the code, I had the impression that it would be best to store all of it together on a project basis and then have the individual pages select the sections that they wish to use. Likewise, I thought the assets should be stored under the project and then selected by the individual pages since it would be easier that way to compare against a quota.
I'm meeting with @CemIskir tomorrow so we can talk about this at length.
So it turns out that we probably don't want to do the asset-storing part at all. SQL-injection is an important point to raise, though, and we should guard ourselves and possibly our users against it. In terms of controlling the database size, here's an idea if we're using SQLite: just use a separate database file per project and limit its size (i.e. check once in a while?). As for the rest of the design, we need an overview of exact units of code that we can split up for people to implement.
@bozhou94, @cemIskir Users create web pages by creating programs that manipulate the DOM of web pages. We need to store these programs and associated projects. The back end needs library functions to manage those projects (think of this similar to how eclipse or visual studio handle things). We need a data model for how projects look like (is there a "file" per page? how about code that should be shared between different pages?).
Presumably users need to update some media content (i.e. at least some images) to design their pages. Are they stored with each project or do users have a global store for them?
Finally, potential long-term features are shared projects for cooperative programming and some form of version control.
Deliverable: some form of data diagram describing the entities and their relations, description of back end API for managing the data.