dao-xyz / peerbit

P2P database framework with encryption, sharding and search
https://peerbit.org
Apache License 2.0
192 stars 15 forks source link

Using a fully featured encrypted filesystem #23

Open TheRook opened 2 years ago

TheRook commented 2 years ago

I am wondering if this project would be better using one of the existing encrypted filesystem for a private dapp's filesystem. The benefit of using one that has already been written is that it has undergone peer review and performance and security testing.

https://github.com/MatrixAI/js-encryptedfs

The "ephemeral key" generated by peerbit could be the AES-GCM key used by an encrypted filesystem. Arbitrary access, streaming, and better privacy through random fragmentation. https://github.com/dao-xyz/peerbit/blob/master/packages/utils/crypto/src/encryption.ts#L97

It is better to use separate keys separate needs. One KDF can be used for data at rest and another for transport. The libp2p world developed the noise protocol for their encryption needs because it supports broadcasting and multicasting using a shared key. https://www.wolfssl.com/tls-1-3-versus-noise-protocol/

Some libp2p clients are already using noise for encrypted broadcast/multicasting. But there isn't a really good encrypted filesystem for IPFS. One of the best e2e protocol is of course the Signal protocol, which is what Berty is using with orbitdb and there is a javascript port:

https://github.com/signalapp/libsignal-protocol-javascript https://berty.tech/docs/protocol

marcus-pousette commented 2 years ago

Thank you for the post. I have to spend some time digging into the references. These are great ideas.

I agree with you on separate keys for separate needs. On the other hand one have to value the importance of keeping a codebase somewhat simple. It is a tricky balance. It is actually possible with the current implementation to have different encryption for the commits and for the peer communication.

I have spent a fair amount of time going through the signal encryption scheme. I was heavily inspired by the earlier version of Signal when doing the implementation. But there are some areas I am still exploring. The main challange has been not to introduce privacy features without thinking about what constraints this would impose on future wanted features (like discoverability and sharding). If all features of libsignal would be implemented the security has to be optional so you still can build apps that still can span the whole privacy, discoverability, performance space.

Yes, out of the box I have not been confident relying on the encryption features libp2p provides natively, but perhaps this could be something to consider in the future for performance improvements.

Will come back later with a more detailed answer regarding the references above.

TheRook commented 2 years ago

peerbit is such a huge idea that it has a splash zone, where other projects also need a good solution for the same common problems. "do one thing and do it well" applies here as well. To help with early growth of peerbit is best to focus on doing one thing well that is new and encouraging adoption of a tech that devs normally wouldn't see. In the case of peerbit it is private search - so make sure that peerbit has the best private search.

Signal has vetted and approved by ex-NSA contractor Edward Snowden. Signal has perfect forward security (PFS) and messages can arrive out of order, which is why berty has had good success pairing it the signal protocol with orbitdb. Signal adds a lot in terms of privacy, and having an encryption layer means that blocks cannot be easily tracked by a PRISM-style mass surveillance system that can sniff all packets.

In an ideal world if every app used the best in class, the Signal protocol being an open standard could be adopted by any number of projects that can now support Berty's chat by simply supporting the same encrypted transport layer with libp2p RPCs. Now Berty becomes more useful, and so do other dapps. That is the Open Source ethos and how lasting platforms are built.

The same goes for a filesystem, there should be one project that does this well. Multiple dapps could share a gdrive-style folder which has encrypted access control. Using peerbit with zk membership means you don't need to pay gas fees to have enforce the R/W access controls on the filesystem. Fuse works in JavaScript so you can using one of the more popular encrypted filesystems like securefs on top of IPFS. A dapp developer is going to want a debug tool to mount a folder on their desktop, something FUSE does on a wide range of platforms. Another bonus of having encryption on the FS layer is you can have multiple peerbit databases, each with their own permissions that share one application context. So you can have user enabled chat with pub/sub, or user-supplied form posts, and then admin controlled wiki powered by a document db.

marcus-pousette commented 2 years ago

peerbit is such a huge idea that it has a splash zone, where other projects also need a good solution for the same common problems. "do one thing and do it well" applies here as well. To help with early growth of peerbit is best to focus on doing one thing well that is new and encouraging adoption of a tech that devs normally wouldn't see. In the case of peerbit it is private search - so make sure that peerbit has the best private search.

Yes. Well, how I see it is, as a { shared, discoverable, private } state. Basically approaching the discoverability/privacy problem, which usually are two opposing forces.

In an ideal world if every app used the best in class, the Signal protocol being an open standard could be adopted by any number of projects that can now support Berty's chat by simply supporting the same encrypted transport layer with libp2p RPCs. Now Berty becomes more useful, and so do other dapps. That is the Open Source ethos and how lasting platforms are built.

Absolutely. I totally agree with you. The challange is usually that not all apps require the same amount of security. An anecdote, not all apps are suited be running with blockchain security. A transaction sending 1$ should not use (and pay for) the same security guarantees as for a 1000000$ transaction. Problem with many solution, in general in the security and privacy space, is that the tools are not adaptive to the problem at hand, which is an inefficiency I want to tackle. A database either has to be encrypted altogether, or perhaps not. With Peerbit this granularity is on the commit level, i.e. the could be some parts of the ipfs-log that you can read, and some you cant. Another peer, might be able to read and understand the whole commit log.

The same goes for a filesystem, there should be one project that does this well. Multiple dapps could share a gdrive-style folder which has encrypted access control. Using peerbit with zk membership means you don't need to pay gas fees to have enforce the R/W access controls on the filesystem.

Yes, this is something I have planned on working on quite soon.

Fuse works in JavaScript so you can using one of the more popular encrypted filesystems like securefs on top of IPFS. A dapp developer is going to want a debug tool to mount a folder on their desktop, something FUSE does on a wide range of platforms. Another bonus of having encryption on the FS layer is you can have multiple peerbit databases, each with their own permissions that share one application context. So you can have user enabled chat with pub/sub, or user-supplied form posts, and then admin controlled wiki powered by a document db.

I am not familiar with Fuse, but I see what you are after. As of know you would have encryption on the FS layer, but that is controlled by whomever is making the commits to the database. As a replicator I would not have any say in how things are encrypted at rest. In fact, I can be a replicator of databases without ever be able to know the contents, just that someone are making commits and someone else is reading things (verified by ZK membership in the future). Fuse might perhaps help the separation of concerns here (again I am not familiar with it, so I have to read more about it before I can say anything meaningful)

TheRook commented 2 years ago

The security guarantees for 0$ here are pretty amazing. Doing any kind of access control on chain is expensive, there are other related projects that could really use free R/W access control for datasets that zk membership brings. Keeping it at 0$ means other chains that cost money can adopt it more easily and are less likely to outright clone it.

Let me know when you are ready to start on a production-ready C++ version of this, I know some other people that might want to join. C/C++ is the future.

Absolutely. I totally agree with you. The challange is usually that not all apps require the same amount of security. An anecdote, not all apps are suited be running with blockchain security. A transaction sending 1$ should not use (and pay for) the same security guarantees as for a 1000000$ transaction. Problem with many solution, in general in the security and privacy space, is that the tools are not adaptive to the problem at hand, which is an inefficiency I want to tackle. A database either has to be encrypted altogether, or perhaps not. With Peerbit this granularity is on the commit level, i.e. the could be some parts of the ipfs-log that you can read, and some you cant. Another peer, might be able to read and understand the whole commit log.

TheRook commented 2 years ago

One last thing is a note on compliance, which if done right here could be a major market differentiator.

If someone needs a fintech or medical data app, or to store identifying medical data or other PII collected by your phone then the app will have to be reviewed by a consultancy. In the medical space, it is common to have yearly mandatory reviews from a security team to satisfy insurance. I worked as a consultant for many years doing reviews for the medical and financial world, and I am in the top 10 for crypto and security on SO: https://stackoverflow.com/users/183528/rook

To make the review process as easy as possible, you want to rely on security systems that have already been reviewed, or even better; something that the assessment team uses themselves (like SSL/TLS or gmail). Signal is very popular in the hacker/infosec community and the same is true for encrypted filesystems like EncFS, which is an established project that has been assessed been approved to work in sensitive industries.

https://en.wikipedia.org/wiki/EncFS

marcus-pousette commented 2 years ago

The security guarantees for 0$ here are pretty amazing. Doing any kind of access control on chain is expensive, there

Yes! I am really excited about it.

Let me know when you are ready to start on a production-ready C++ version of this, I know some other people that might want to join. C/C++ is the future. Thank you! I will keep in touch.

One last thing is a note on compliance, which if done right here could be a major market differentiator.

Yes it is good to consider things like this if you want to big adoption of the things you build, and perhaps also create incentives for migrations from legacy system and show that you can still act in compliance.

Its cool that you have this background, I see the way you are able to talk about all kinds of topics in detail that you have been in this space for quite a long time. I greatly appreciate all the ideas you have provided to the project!

TheRook commented 2 years ago

Yeah... I have been in this space for a while, although DHTs are nothing new I am excited to see the kind of features orbit/peerbit are putting out - Berty especially. I think we are closer than ever to having a good devops toolchain for these kinds of DHT dapps.

Another project similar to peerbit that is under active development is redwood: https://github.com/redwood/redwood

Freenet and PerfectDark both have DHT client platforms with some form of a shared database - and it pays to look at all three of these to see their strengths and how they (so far) have failed to get any real adoption despite all being under active development. Freenet had anonymous chat and forms over DHT almost 10 years ago, but it was very slow and building client apps for freenet is still baffling. There are probably more Berty users than freenet, and Berty is maybe 9 months old. One of the big problems that all three of these face is that a dev can't just pick up and run with the libraries and frameworks they already know and love, having to learn something new is a non-starter in a hackathon where every moment matters. CouchDB fits the async-peer paradigm and they have done the hardwork of building a good client library and backend, testing and deployment tooling, educating developers and building a vibrant community around one of the most resilient databases ever written.

... All of these p2p db projects lack adoption, adherence to standards and no thought of interoperability, compliance, operations and maintenance. We can talk about fancy things to do with crypto, but at the end of the day: it is all in the doing - the devops limitations prevent adoption. All of these p2p dapp platforms are all building siloed monoliths for an audience of developers that have no shortage of bright and alluring platforms to build on. Bridging this gap is it's own engineering problem.

A very common death of app startups in the web2 space is the cost of running the database. If you are already a CouchDB user and you are struggling you can adopt another backend to keep the app afloat without needing any kind of hosting costs.

marcus-pousette commented 2 years ago

I think we are closer than ever to having a good devops toolchain for these kinds of DHT dapps.

I think soo too!

Another project similar to peerbit that is under active development is redwood: https://github.com/redwood/redwood

Spent some time digging through this. It is a very interesting project. Especially how different the developer experience is with the approach they are going with. Like, you have minimal amount of code surrounding the state, but basically patches are submitted that contains code that can be interpreted to modify the state.. It is going to be interesting too see how it unfolds

... All of these p2p db projects lack adoption, adherence to standards and no thought of interoperability, compliance, operations and maintenance. We can talk about fancy things to do with crypto, but at the end of the day: it is all in the doing - the devops limitations prevent adoption. All of these p2p dapp platforms are all building siloed monoliths for an audience of developers that have no shortage of bright and alluring platforms to build on. Bridging this gap is it's own engineering problem.

Yes. I think there is usually too long jump to adopt something new, especially if you are going from Web2 -> Web3. People don't have infinite amount of time to explore new things, and want to get started right away and reuse as much as they can from previous experiences. In addition to this, the "new" thing has to provide some actual value that is significantly better than what you have now, like reduced costs, improved privacy, or better search etc.

A very common death of app startups in the web2 space is the cost of running the database. If you are already a CouchDB user and you are struggling you can adopt another backend to keep the app afloat without needing any kind of hosting costs.

Yes. That will be great

marcus-pousette commented 2 years ago

Another thing, regarding

All of these p2p dapp platforms are all building siloed monoliths [...]

The same problem is now very prevalent for figuring out what a "DID" should be. In my opinion this an optimization problem where the longevity of the standard depends on the entropy you are applying it for. You can do a bad one in a matter of minutes, that only solves the identity problem for your particular silo

My idea, is that it is not super important that you create the perfect standard out of the box (since this is a fundamentally a hard problem since you have to consider huge set of domains and timeframe), but to create a solution that allows a new solution in the future to easily replace it. The actual DID in some sense would be the merkle root of all identity migrations, that would yield Log(N) "computational complexity" on verification where N is standards/solutions you are compatible/have migrated from.

marcus-pousette commented 2 years ago

It is also interesting to think that really good silos eventually become standards in some sense (or that is at least what silo builders want to believe)

TheRook commented 2 years ago

The same problem is now very prevalent for figuring out what a "DID" should be. In my opinion this an optimization problem where the longevity of the standard depends on the entropy you are applying it for. You can do a bad one in a matter of minutes, that only solves the identity problem for your particular silo

There is a difference between a silo and a domain. Microsoft LDAP for domain controllers did this beautifully, LDAP is used heavily in domain access control where a corporation becomes a kind of kingdom. Since the web0.1 space, domain names via DNS have been the domain identifier - but now in web3 we have .eth domains and really other chains what to have their own TLDs and there are projects to just help arbitrary chains get their own TLD. So who do you trust? Do you trust the government and ICANN? Or do you even have an option of what certificate authority or naming authority you use - and simply providing that option opens doors. "Build it and they will come."

My idea, is that it is not super important that you create the perfect standard out of the box (since this is a fundamentally a hard problem since you have to consider huge set of domains and timeframe), but to create a solution that allows a new solution in the future to easily replace it. The actual DID in some sense would be the merkle root of all identity migrations, that would yield Log(N) "computational complexity" on verification where N is standards/solutions you are compatible/have migrated from.

No you fit perfectly as a research project out on the edge - sure there are projects adding privacy to Kad and IPFS - but I don't know of any other project that can do federated private search and that really changes the game. But in order to get adoption you need standards, compliance, and agreement. If privacy and security is the selling point, then it needs to solve compliance and be an open standard so that no one can be locked out and that new tools can be made in the ecosystem.

On the topic of merkle roots, I can't believe that redwood doesn't use them! I think the author doesn't have a background in cryptography otherwise it would be a no-brainer. You could have the user choose a merkel root from a trusted naming authority, within that namespace the authority issues branches which represent a domain controlled by an org. Maybe we can reach out to these maintainers, the freenet and perfectdark community as well and pool together what works into a formal standard.

marcus-pousette commented 2 years ago

But in order to get adoption you need standards, compliance, and agreement

Yes ... !

On the topic of merkle roots, I can't believe that redwood doesn't use them!

Not sure, but I think it is more to that. These guys have explored the space for some time, I am pretty sure they have some reason (if they are not doing it)

Going to have a talk with them on the next Braid meeting showing a little Peerbit stuff and hopefully have some interesting discussion.

Maybe we can reach out to these maintainers, the freenet and perfectdark community as well and pool together what works into a formal standard.

Yes that would be good. I think I have to absord a lot of ideas/concepts coming weeks and months since there has been so much work in this space that is important to learn from, why certain parts succeeded and others failed. And also try to find synergies with projects that already exist. Implement libsignal is something I really want to do asap, but I know there will be headaches doing so, not perhaps for technical reasons, but because the problem domain that libsignal is designed for is different from this.

TheRook commented 2 years ago

libsignal is going to be huge for PRISM protections and also compliance for HIPPA. Berty said they are using some adapted signal protocol for orbitdb that still relies on ratcheting - so maybe you can take a peak at their implementation.

Yes that would be good. I think I have to absord a lot of ideas/concepts coming weeks and months since there has been so much work in this space that is important to learn from, why certain parts succeeded and others failed. And also try to find synergies with projects that already exist. Implement libsignal is something I really want to do asap, but I know there will be headaches doing so, not perhaps for technical reasons, but because the problem domain that libsignal is designed for is different from this.

Cloudflare's use of Merkle Trees in nimbus interesting: https://blog.cloudflare.com/introducing-certificate-transparency-and-nimbus/

marcus-pousette commented 1 year ago

libsignal is going to be huge for PRISM protections and also compliance for HIPPA. Berty said they are using some adapted signal protocol for orbitdb that still relies on ratcheting - so maybe you can take a peak at their implementation.

Yes it, is going to be interesting to see how they (if they) pulled it off 🕵️‍♂️

Cloudflare's use of Merkle Trees in nimbus interesting: https://blog.cloudflare.com/introducing-certificate-transparency-and-nimbus/

hmm, going to save this for later reading material