functionland / fula-archived

Client-server stack for Web3! Turn your Raspberry Pi to a BAS server in minutes and enjoy the freedom of decentralized Web with a superior user experience!
https://fx.land
MIT License
4 stars 0 forks source link

State of DataProtocol and FIleProtocol May diverge. #202

Open farhoud opened 2 years ago

farhoud commented 2 years ago

Overview

Uploading a file in application need two step:

Known Issues:

Scenario ending up with a split ed state:

File manager like Google Drive

The record that keep cid is not standardize which make it hard to get list of the file on box without knowing the implementation of the app that is stored it.

gitaaron commented 2 years ago

Glad you are thinking about this edge case. I guess there are two potential ways there could be a discrepancy -

  1. A file was added to IPFS and no corresponding record was added to OrbitDB.
  2. A file was removed from IPFS by interacting with the IPFS api directly (circumventing FULA).

Is that correct?

mehdibalouchi commented 2 years ago

Standardizing the records on the graph protocol makes the API opinionated and it will welcome hacks on the developer side. Nonetheless, I think there is no way to come up with a standard for storing all the file-related data. As @farhoud mentioned, the use case here is to add some related records after the file uploading is complete. We have two different types of file-related records here:

We can add support for collections of files as some sort of metadata. This will help the application to create permanent collections and search among them if they want (or get the list of them for the file manager).

In conclusion, I think we cannot (and should not) prevent diverging of the data protocol and the file protocol states. What we can do is provide a way for making permanent collections of files. This should also have the support for nested collections.

ehsan6sha commented 2 years ago

@mehdibalouchi But if data protocol state diverges from file protocol state, for example, if someone erases the files but database is still there, or someone erases the database but files are still there, what are the risks? How can they be synced again?

mehdibalouchi commented 2 years ago

@ehsan6sha these issues are also present when you are using a traditional DB with a web2 arch.

if someone erases the files but database is still there

If you are storing some file paths (a pointer to a file on a file system) in a database and then for some reason the records get removed from the DB, there is nothing the DB can do about it. A common solution to this problem for centralized databases is to store the file inside the database records (e.g. blob), this is currently possible with existing APIs (with a low performance because orbitDB keeps all the data inside the RAM).

or someone erases the database but files are still there

In this case, we will have some orphan files (as @farhoud mentioned). By keeping the file and the graph protocols completely independent, we can ensure that any request for a file is made through the IPFS network. Therefore we can rely on the IPFS garbage collector and pinning service to handle the orphan files.

How can they be synced again?

The application is responsible for keeping the data protocol and the file protocol states synchronized. This is where permanent collections can help. An application should be able to create a permanent collection (with a permanent id) and store files in it. At any time if the application wants to resync the states, it can search through all the files in the collection and update the data protocol state.

mehdibalouchi commented 2 years ago

The collections should have a human-readable identifier (e.g. todo-app-XXX/photos). The union of [USER_ID, APP_ID, COLLECTION_ID] must be unique over the network

farhoud commented 2 years ago

@ehsan6sha @keyvan-m-sadeghi @mehdibalouchi
We can use UnixFS and encode our folder structure inside ipfs (not orbitdb). store unix path of file on Orbitbb. this way diverge are detectable. also app like Drive can use data structure to show files. Or inspiring it and also apply security to it.

farhoud commented 2 years ago

Never mind it is What MFS uses. and replicating dag is harder than it looks.

ehsan6sha commented 2 years ago

@mehdibalouchi the thing is for central web2 model, it is system admins who have access to the data. So divergence will be minimized and will be handled by an expert. Here it is the public who has access and divergence can be a killer to the experience