SharezoneApp / sharezone-app

Sharezone is a collaborative school organization app for iOS, Android, macOS and web with +500,000 downloads. Built with Flutter & Firebase.
https://sharezone.net
European Union Public License 1.2
270 stars 48 forks source link

Design doc: Restrict the size of the file storage #889

Open nilsreichardt opened 1 year ago

nilsreichardt commented 1 year ago

Storage Limit

Free users: 100 MB Plus users: 30 GB

Costs

For the case that a Plus user uses 30 GB, the costs are: 0,69 USD / month (without backups). Not sure, if we should set the storage limit to 30 GB or 50 GB. 50 GB would cost 1,15 USD / month (without backups).

See: https://cloud.google.com/products/calculator/#id=82e1c57c-2cd9-48aa-bf82-e43d73a77fec

Implementation Design

Storing the current file usage

TL;DR: Using Realtime Database

We need a place where we store the current file usage of a user. I would suggest using the Firebase Realtime Database for this because it's critical that this value is accurate. Using Firestore could have the problem that when a user deletes many files (deleting a course), we hit the limit of 1 write per sec of Firestore in the same document. The Realtime Database hasn't this problem.

The structure of the database could look like this:

{
  "users": {
    "userId": {
      "current_storage_usage": {
        "bytes": 23,
        "lastUpdatedAt": 1695240614873
      }
    }
  }
}

Updating the current file usage

TL;DR: Using Cloud Functions

When a user uploads / deletes a file, we need to update the current file usage. We can do this in the Cloud Function that is triggered when a file is uploaded. We can get the file size from the file object that is passed to the Cloud Function. Then we can update the bytes field in the Realtime Database.

Client uploads a file

When a user uploads a file, the client needs to check if the user has enough storage left. This can be done by reading the bytes field from the Realtime Database. If the user has enough storage left, the client can upload the file. If not, the client should show an error message.

When uploading a file as attachment, this check happens when the user clicks the "Save" homework/information sheet button. Checking the current storage requires a network connection. In this context, it's not a problem because the user needs a network connection to upload the file anyway. When uploading a file via the files feature, the check happens when the user selects a file.

We have two options for checking if the user has enough storage left:

  1. The client directly accesses the Realtime Database
  2. The client calls a Cloud Function that checks if the user has enough storage left

Using the Cloud Function has the advantage that we can change the logic of checking the storage usage without updating the client. For example, when we change the storage limit for free users, we only need to update the Cloud Function. If we would directly access the Realtime Database, we would need to update the client. However, I'm not sure if it's worth (we shouldn't update the limits that often) and it could confuse the user because the UI still shows the old storage limits.

Directly accessing the Realtime Database has the advantage that it's the fastest and cheapest solution.

I would suggest using option 1 (directly accessing the Realtime Database) because it's the fastest and cheapest solution and we don't win that much by using option 2.

We might also want to show a note in the files feature when the user is close to the storage limit in the files features. For that streaming the bytes field from the Realtime Database to the client would be the easiest solution.

Validating the storage usage

TL;DR: Using Cloud Functions

Above I talked about how the client can check if the user has enough storage left. However, this only for the purpose to display a user friendly message when there is no storage space anymore. Additionally, we need to check the storage limit on the server side.

In the first place, I thought to use the Storage Security Rules. However, in the Storage Security Rules, we can't access the Realtime Database (only Firestore). Additionally, we can't notify the user. Using a Cloud Function is a better solution. We can use the onFinalize trigger to check the storage usage when a file is uploaded. If the user has no storage left, we can delete the file and notify the user (e.g. sending a push notification).

We should wait a few months before implementing this server side validation because old clients could still upload files and they don't know about the storage limit.

Design

Current storage usage on the profile page: https://www.figma.com/file/EIlLPe7KdF4bQXLQVAsDeM/SharezoneApp?type=design&node-id=2719-874&mode=design&t=TRl2XQH1388ekZJK-4 Error dialog when file is too big: https://www.figma.com/file/EIlLPe7KdF4bQXLQVAsDeM/SharezoneApp?type=design&node-id=2377-5182&mode=design&t=TRl2XQH1388ekZJK-4 Current storage usage when uploading a file via the files sharing features: https://www.figma.com/file/EIlLPe7KdF4bQXLQVAsDeM/SharezoneApp?type=design&node-id=2372-4508&mode=design&t=TRl2XQH1388ekZJK-4

What is with the exiting files?

When rolling out this feature, we need run a script that updates the bytes field in the Realtime Database for all users with the existing files. The script does the following:

  1. Store the timestamp when the script was started
  2. Get the documents from the Firestore files collection that were created before the script was started
  3. For each document, get the file size from the Storage bucket
  4. Update the bytes field in the Realtime Database
  5. Rerun the script with all files that were created after the script was started

We should run the script before deploying the Cloud Functions that update the bytes field when a file is uploaded / deleted. Otherwise, it could happen that we count some files twice.

What is the current users that already hit the storage limit?

This is not a problem because they are just above the storage limit and can't upload any files. When they delete some files, they can upload files again. Optionally, we could send them a push notification that they are above the storage limit.

nilsreichardt commented 11 months ago

@Jonas-Sander Could you review this design doc? If everything is okay, I can begin to implement this feature 👍

Jonas-Sander commented 11 months ago

Using the Cloud Function has the advantage that we can change the logic of checking the storage usage without updating the client. For example, when we change the storage limit for free users, we only need to update the Cloud Function. If we would directly access the Realtime Database, we would need to update the client. However, I'm not sure if it's worth (we shouldn't update the limits that often) and it could confuse the user because the UI still shows the old storage limits.

I mean in the end you choose Realtime Database anyways, but I don't even think that argument makes that much sense. We should probably get the max storage space per remote config anyways. And we already need to stream that value anyways to show the current storage space taken.

Jonas-Sander commented 11 months ago

Hm, would it even be smart to make the max storage space a "global"?
I mean there might be good reasons why a user wants/needs more. Maybe we'll even offer some more storage space for a few euros. In this case we could write to the realtime database as well, no?

We could also start with a global and make changes in the future if the case comes. Although I could also imagine a situation where a teacher would write to us right away that uploaded more than 30 GB and needs more storage for example.

Jonas-Sander commented 11 months ago

In the first place, I thought to use the Storage Security Rules. However, in the Storage Security Rules, we can't access the Realtime Database (only Firestore). Additionally, we can't notify the user. Using a Cloud Function is a better solution. We can use the onFinalize trigger to check the storage usage when a file is uploaded. If the user has no storage left, we can delete the file and notify the user (e.g. sending a push notification).

I'm fine with your solution but why couldn't we just sync the bytes used to Firestore, so we can access it from the security rules? Even a small delay wouldn't be tragic since its eventually consistent, i.e. a user might upload two instead of one file above the storage limit if the sync isn't quick enough but it doesn't really matter since in the end he is over the limit and can't upload any more files.

Jonas-Sander commented 11 months ago

Otherwise LGTM.

Jonas-Sander commented 11 months ago

Is there an easy way for users to see all the files they uploaded with size? I think a good chunk of our users might be over the storage limit and they would probably appreciate help seeing which files they uploaded and their size.

nilsreichardt commented 11 months ago

Is there an easy way for users to see all the files they uploaded with size? I think a good chunk of our users might be over the storage limit and they would probably appreciate help seeing which files they uploaded and their size.

I planned to have an overview on the profile page:

Current storage usage on the profile page: https://www.figma.com/file/EIlLPe7KdF4bQXLQVAsDeM/SharezoneApp?type=design&node-id=2719-874&mode=design&t=TRl2XQH1388ekZJK-4

nilsreichardt commented 11 months ago

would it even be smart to make the max storage space a "global"?

What do you mean with global? Storing the max storage space in the user document so that every user can have a personal max storage space?

nilsreichardt commented 11 months ago

but why couldn't we just sync the bytes used to Firestore, so we can access it from the security rules?

Because you have then the issue that the bytes number in Firestore could be completely wrong. Imagine you have 100 MB storage used as a free user. You delete a course with 10 MB but spread into 50 files. Deleting 50 files might take 1 second and have you have 50 updates to Realtime Database, resulting in 50 Cloud Functions triggers that all try to update the Firestore document in 1 sec to decrease the bytes field. In that case, it could be that only a few writes will be accepted and the others are ignored so that in the Firestore Document the current usage could be 95 MB instead of 90 MB what the actual current usage should be. In that case, the app (pulls data from Realtime Database) says "Easy, you have only 90 MB used" but the Security Rules think you have 95 MB used.

If we use Security Rules to validate, we should do that if the security rules can access the source of truth bytes field.

Jonas-Sander commented 11 months ago

Is there an easy way for users to see all the files they uploaded with size? I think a good chunk of our users might be over the storage limit and they would probably appreciate help seeing which files they uploaded and their size.

I planned to have an overview on the profile page:

Current storage usage on the profile page: https://www.figma.com/file/EIlLPe7KdF4bQXLQVAsDeM/SharezoneApp?type=design&node-id=2719-874&mode=design&t=TRl2XQH1388ekZJK-4

No I mean seeing the individual files, so they can choose what files they might consider deleting.

Jonas-Sander commented 11 months ago

would it even be smart to make the max storage space a "global"?

What do you mean with global? Storing the max storage space in the user document so that every user can have a personal max storage space?

Yes, this is conceptually what I mean. I don't know though if storing it in every user document is the best option. Maybe. I could also imagine a hybrid approach where we set a global max and if we need to change the max for one user, we write it in the user doc, which would then take precedence over the global max.

Jonas-Sander commented 11 months ago

but why couldn't we just sync the bytes used to Firestore, so we can access it from the security rules?

Because you have then the issue that the bytes number in Firestore could be completely wrong. Imagine you have 100 MB storage used as a free user. You delete a course with 10 MB but spread into 50 files. Deleting 50 files might take 1 second and have you have 50 updates to Realtime Database, resulting in 50 Cloud Functions triggers that all try to update the Firestore document in 1 sec to decrease the bytes field. In that case, it could be that only a few writes will be accepted and the others are ignored so that in the Firestore Document the current usage could be 95 MB instead of 90 MB what the actual current usage should be. In that case, the app (pulls data from Realtime Database) says "Easy, you have only 90 MB used" but the Security Rules think you have 95 MB used.

If we use Security Rules to validate, we should do that if the security rules can access the source of truth bytes field.

I mean it depends on how we implement it. As the value in the realtime database is already absolute we could just set the firestore value to the latest value from the realtime database. I mean there would still be some edge-cases (debouncing several updates, making the latest change the "winner"), but I think this would be doable. Not saying that we absolutely should but just as an alternative.

Also deleting so many files wouldn't trigger the security rules regarding max upload size anyways. So the edge case would be bulk uploading several files, wouldn't it? So there might be the case that the value is not updated quick enough, but since it is eventually consistent it wouldn't really hurt us. A user might upload a few files above the limit instead of only one, but since he can't upload any more, it doesn't really matter from our point.

nilsreichardt commented 11 months ago

No I mean seeing the individual files, so they can choose what files they might consider deleting.

Yes, I thought as well about this and in the first place I thought it would be difficult to implement but looking at it again it's very easy. We just stream use this query firestore.collection('Files').where('creatorID', isEqualTo: userId).orderBy('sizesBytes) and we can display all files that a User generated.

We can add a page for that in the file sharing feature tab and in the card on the profile page 👍

nilsreichardt commented 11 months ago

Yes, this is conceptually what I mean. I don't know though if storing it in every user document is the best option. Maybe. I could also imagine a hybrid approach where we set a global max and if we need to change the max for one user, we write it in the user doc, which would then take precedence over the global max.

That could be an option. But to be honest, I would only add the architecture for that if we actually build this (keeping the current implementation simple). Currently, I don't see a big breaking change here (when adding it later), and it should be relative "easy" to add without reimplementing the entire feature. The current usage of all files (without the submissions feature) is 468.18 GB. From this, I would assume that only a very few people (maybe only 1-2) actually would need more than 30 GB. Assuming that the 468.18 is GB only used by users that have maxed out the 30 GB would only be 15.6 users who have actually used 30 GB (468.18 / 30 = 15.6). Therefore, I don't think this feature will be needed in the future.

nilsreichardt commented 11 months ago

As the value in the realtime database is already absolute we could just set the firestore value to the latest value from the realtime database.

Would be definitely an option but I don't know how to implement this in an easy way because we don't we know it what the last value is. The current structure updates the bytes field for every file update independently and the Cloud Function doesn't know if this was the first or last update.

nilsreichardt commented 2 months ago

Note: We could also use the new sum operator of Firestore