Discussion: orbit-db concerns

@DanielVF raised some concerns about orbit-db and I'd like to be able to discuss them further. I can't remember all points raised. Would you mind posting them here @DanielVF so we can have a more in-depth discussion?

Hopefully this can help: I had started this doc as an attempt to capture the questions raised during the eng weekly discussion today. Was planning on using it as an agenda for our upcoming meeting to discuss orbitdb/identity (currently scheduled for next Tue 12/4).

I am out of the office today, so here's a quick bulleted list of technical concerns. [Edit: updated list to finish out discussion]

Pubsub:

Opinion - IPFS PubSub is currently mostly centralized, basically no routing. You can't just connect to any IPFS server and send and receive messages with the whole the network. Note that there are a few bootstrap nodes that IPFS PubSub connects to by default, and if you and someone else are both connected to one, then messages will flow. I don't know how this will scale though.
Opinion - IPFS PubSub is currently extremely immature. It's not even really experimental, it's just a combination of the dumbest things that could make something sort of work that could be called pubsub, and slapped onto the side of IPFS server.
Opinion - IPFS PubSub currently has no spoofing protection, any one can fake messages from anyone.

Orbit DB:

Confirmed - OrbitDB's core building block, the ordered log, doesn't provide any ordering guarantees if a malicious person is writing.
Confirmed - OrbitDB currently has a bug where any DB writer can mess up all future ordering from any future writer of the database, even between other honest writers.
Confirmed - OrbitDB does not sync unless an OrbitDB program is already online, hosting that log you are interested in. This requires that we keep a separate orbit DB online for every single conversation or profile that we wish to persist.
Confirmed - Any writer can DOS the heck out of anyone who reads that OrbitDB database. OrbitDB currently has no DOS protection at all, and will happily download gigs of data. World writable DB's are right out. Private POC sent to OrbitDB
Nope - OrbitDB's head exchange (a part of each sync) is completely unsigned. I could force anyone with an active DB to download giant movies instead, or CP, or just DOS them. The head exchange is authed, so this attack is limited to someone who is a writer on the DB, see above.
Confirmed - OrbitDB requires that each message be pinned, or it is eventually lost, and OrbitDB does not pin them
Confirmed - OrbitDB syncs across IPFS peers by arranging a single DB to single DB rendezvous on a separate channel that the two DB's then use to sync. I'm concerned this may not scale well, since the number of rendezvous is number of people online with a DB's to the power of two. So if there are 10,000 people online with the same DB open, then 100,000,000 IPFS pubsub rooms are in use for just that database. This only applies when each Orbit users has their own local IPFS peer. However, this is how the docs show OrbitDB usually being used, and how we are currently using it.
Confirmed - Unless you are running a local IPFS, Orbit redownloads every step of the past history when opened, one IPFS object at a time.
Confirmed - For each OrbitDB open, four files are open for its LevelDB. This doesn't scale well into the thousands of DB's on a server.
Theoretical - You probably are able to track OrbitDB to know in realtime when a specific browser is online with a DB open, if they are using their own local IPFS server (which we are currently doing). It's probably easy enough to go from this to their identity, so you should be able to track if a certain user is online (with an open OrbitDB).
Opinion - OrbitDB's core assumptions seem to have been designed for transient chat, rather than long term storage.
Opinion - Both OrbitDB and IPFS are not only not production ready, but both don't even have production capable designs in their current implementations yet. It's not just a matter of a few bugs. IPFS Pubsub needs protocol changes, an entire server rewrite, and a completely new way of doing things, OrbitDB needs protocol level changes at a minimum, and probably server/client changes if you want to host and sync hundreds of DBs.

On the profile side:

Profile updates are going to be one of the least often done storage activities. Maybe a couple times a year? And we can do profile updates as a block chain log event for a few cents each. Profiles seem a place we really don't need to build an entirely different second architecture for transaction cost savings.
Even if OrbitDB worked perfectly, we'd still need to overlay a way to know what a given profile looked like at a given block time. This would add new ways to scam, and new ways for transactions to fail.
Even if IPFS PubSub worked perfectly, I hate the idea that there has to be an active process, serving up a specific profile, running somewhere in the world in order to be able to fetch that profile's data.

Thanks @DanielVF . I'll respond to some of the points:

IPFS PubSub is currently mostly centralized, basically no routing. You can't just connect to any IPFS server and the pubsub messages for the network.

So this is my understanding (though I'm still learning myself): Yes the current version of IPFS PubSub uses floodsub, and is in the process of being upgraded to gossipsub (which will provide much more sophisticated routing). It's not intrinsically centralized; it just requires that peers directly connect to one another in order to exchange data. In practice, yes, this tends to require centralized architectures in order to be meaningful, where you would have a small number (or even just one) "server" nodes distributing content to peers. This would be a centralized data transmission model, but the data structure itself is still decentralized (i.e. you can have data locally and check its validity without any centralized server). The centralized pieces are not strictly needed, and are essentially just a convenience to provide redundancy and availability. Not ideal and I agree, having a routing system would be nice.

https://github.com/libp2p/go-libp2p-pubsub

IPFS PubSub is currently extremely immature. It's not even really experimental, it's just a combination of the dumbest things that could make something sort of work that could be called pubsub, and slapped onto the side of IPFS server.

I guess that's one way to put it. It is certainly experimental.

IPFS PubSub currently has no spoofing protection, any one can fake messages from anyone.

Sure - but PubSub is just used to transmit signed messages in orbit-db. It actually doesn't matter who the data is coming from, whether they have good intentions or they are malicious. If the data has a valid signature, it is accepted. If not, it is rejected. The beauty of CRDTs is that they allow data to be received in any order, with duplicates, and they will still converge towards a single "truth". So the worst that a malicious peer can do is choose to withhold data. It can do no damage by changing the order of the data or sending invalid data.

OrbitDB does not sync unless an OrbitDB program is already online, hosting that log you are interested in. This requires that we keep a separate orbit DB online for every single conversation or profile that we wish to persist.

If you're suggesting that every single profile we wish to persist must be open at all times on a server, that is incorrect. Yes, the database must be open for the head exchange to occur. But the solution I've built (as well as 3box) uses pubsub to send a message to the server, requesting it to open a specific database in order to exchange heads. Once the heads have been exchanged, the server may close the database (aside: my code currently does not close the database - I should probably fix that). Persisting a profile just means that it has to be kept somewhere in storage. This could be sharded across servers as needed.

OrbitDB's head exchange (a part of each sync) is completely unsigned. I could force anyone with an active DB to download giant movies instead, or CP, or just DOS them.

Hmm interesting. If you've found a real security hole, shouldn't be too hard to fix. The orbit-db team has been receptive to criticism and fixes. I've got a few open PRs and issues with them right now. To be clear though, because heads are only exchanged with directly connected peers, you would have to explicitly connect with a malicious peer in order to be vulnerable to this. This shouldn't be an issue with Origin's implementation where peers communicate with a trusted server rather than directly with one another. Also, is Origin messaging vulnerable to this? Would be curious to see a demonstration of an exploitation of this.

OrbitDB requires that each message be pinned, or it is eventually lost, and OrbitDB does not pin them that I can tell.

That's not really true. While orbit-db does not technically pin anything, everything is currently stored by default (when you use js-ipfs), so effectively everything is pinned. Orbit-db is also planning to implement "actual pinning": https://github.com/orbitdb/orbit-db/issues/342#issuecomment-379982614

OrbitDB's core assumptions seem to have been designed for transient chat, rather than long term storage.

Not sure about that.

OrbitDB syncs by arranging a single DB to single DB rendezvous on a separate channel that the two DB's then use to sync. I'm concerned this may not scale well, since the number of rendezvous is number of people online with a DB's to the power of two. So if there are 10,000 people online with the same DB open, then 100,000,000 IPFS pubsub rooms are in use for just that database.

Yes, orbit-db uses https://github.com/ipfs-shipyard/ipfs-pubsub-1on1 to create a direct communication channel using pubsub. Your comment about the number of databases applies if there are large numbers of peers connected to each other simultaneously, and simultaneously opening the same database. This gets back to the pubsub discussion, and the fact that neither pubsub nor orbit-db currently provide routing. However, any reasonable routing system would not have all peers directly connected to every other peer. There would likely be some subset of peers designated to "serve" the content to other peers. Thus there would not be 10,000 peers connected to 10,000 other peers. Regardless, this is pretty far away from Origin's use case at the time being, as Origin would likely just depend on a small cluster to serve the content for the foreseeable future.

Both OrbitDB and IPFS are not only not production ready, but both don't even have production capable designs in their current implementations yet. It's not just a matter of a few bugs. IPFS Pubsub needs protocol changes, an entire server rewrite, and a completely new way of doing things, OrbitDB needs protocol level changes at a minimum.

Seems like a fair assessment. We are still in the very early days for this tech, so I understand the hesitation to use it. That said, we're already using this for messaging, and I can think of another immature, scaling-challenged technology that Origin is using. 😉

And part 2, profile:

Profile updates are going to be one of the least often done storage activities. Maybe a couple times a year? And we can do profile updates as a block chain log event for a few cents each. Profiles seem a place we really don't need to build an entirely different second architecture for transaction cost savings.

The thing is that even if the cost is just a few cents, the user experience cost may be much more significant. As @joshfraser pointed out, there's a huge difference between something that costs a few cents and something that is free. Allowing users to create a profile without having any Eth could be a big step towards mass adoption.

Also, just from an engineering perspective: blockchain is not actually needed for a lot of features of a public profile. For unverified profile attributes, it's not needed at all. At a minimum, someone should just be able to create a profile and say, "hey this is me" without writing data to the blockchain. Even if we forget the user experience concerns, this is just wasteful use of blockchain. Like using a chainsaw to slice a pizza.

I know it seems like a lot to build a whole separate solution. Why not just use blockchain for everything? But the fact is that a decentralized identity system (that is free and decoupled from any blockchain) can eventually have lots of interesting applications outside of Origin. It sucks that this doesn't already exist in a mature form for Origin to pick up and use. But on the bright side, Origin has a chance to help pioneer this type of solution.

Also, I'm not necessarily suggesting that Origin build its own off-chain identity system. 3box already exists, and they have an active and great team working on it. @joshfraser has some concerns with their centralized address server, which is the main reason not to use 3box right now. But they are planning to decentralize the address server, so I would definitely recommend keeping 3box in mind as something to use eventually if not right away.

Even if OrbitDB worked perfectly, we'd still need to overlay a way to know what a given profile looked like at a given block time. This would add new ways to scam, and new ways for transactions to fail.

Shouldn't be hard to snapshot a profile when making any on-chain transaction, and including the snapshot in the IPFS blob associated with the transaction. Another approach would be the state channel approach, where each user signs a snapshot of the other user's profile at every step, along with a pointer to the previous signature. This creates a signed, agreed-upon audit trail of the mutually signed profile states through various transaction states, that could be examined and verified by an external auditor in case of a dispute. This audit trail could be built on top of our existing messaging solution.

Even if IPFS PubSub worked perfectly, I hate the idea that there has to be an active process, serving up a specific profile, running somewhere in the world in order to be able to fetch that profile's data.

Again, there doesn't have to be an active process for every profile at all times. The profile just has to live in storage somewhere that is accessible. It only has to be loaded into memory upon request from a peer.

Oops, missed an important one I wanted to address:

OrbitDB's core building block, the ordered log, doesn't provide any ordering guarantees if any a malicious person is writing.

Correct. This is by design. I think you're comparing orbit-db to blockchain here, which is like comparing apples and oranges. If you want guarantees about ordering, you have to have consensus - i.e. blockchain. This is because there's no way to verify time purely using cryptography (if only there was!).

ipfs-log (foundation of orbit-db) is not meant to be a log in the blockchain/ledger sense - only in a data structure sense. It's used as a CRDT implementation, to be able to gradually merge data together in order to eventually converge on a "truth". My understanding is that the log structure provides an elegant CRDT implementation.

So a log is only as trustworthy as the owners of the write-access keys. Yes, they can write history in whatever order they like. If you want any kind of ordering guarantee, then you need some consensus-based solution (no reason you can't integrate blockchain with orbit-db to get the best of both worlds).

IPFS PubSub currently has no spoofing protection, any one can fake messages from anyone.

Sure - but PubSub is just used to transmit signed messages in orbit-db. It actually doesn't matter who the data is coming from, whether they have good intentions or they are malicious. If the data has a valid signature, it is accepted. If not, it is rejected.

This lack of spoofing protection is more about attacks at the IPFS PubSub layer. I agree that it's not relevant for attacks at the layer of the contents of OrbitDB messages.

The beauty of CRDTs is that they allow data to be received in any order, with duplicates, and they will still converge towards a single "truth". So the worst that a malicious peer can do is choose to withhold data. It can do no damage by changing the order of the data or sending invalid data.

Well, maybe a malicious IPFS peer could sign other servers up for subscriptions they didn't sign up for, or maybe unsubscribe them without them knowing, send a DDOS through other servers that looks like it is coming from a yet another server...

Your idea of making a master database program that automatically opens DB's is an improvement to Orbit DB's default.

That's not really true. While orbit-db does not technically pin anything, everything is currently stored by default (when you use js-ipfs), so effectively everything is pinned. Orbit-db is also planning to implement "actual pinning": orbitdb/orbit-db#342 (comment)

JS-IPFS storing everything by default is a security hole and an operational nightmare. I can easily get the server to run out of disk by making it request things, and when you want to clean up my mess you don't have an easy way to sort out the important Orbit DB messages from my spam, since they are all not pinned.

Seems like a fair assessment. We are still in the very early days for this tech, so I understand the hesitation to use it. That said, we're already using this for messaging, and I can think of another immature, scaling-challenged technology that Origin is using. 😉

I use such strong words to describe Orbit DB's immaturity just because it's easy to think that its immaturity is equivalent to Ethereum immaturity. Ethereum is a couple of orders of magnitude more mature and has been operating in an extremely hostile environment with many millions up for grabs if someone breaks it. Yes it has scaling limits, but at the current scales, it's a reliable system that can operate successfully in the middle of actively malicious behavior. Orbit DB sometimes has issues talking from one browser on your computer to another. (okay, that was a low blow :D) Also Ethereum operates just fine with malicious writers.

Correct. This is by design. So a log is only as trustworthy as the owners of the write-access keys. Yes, they can write history in whatever order they like.

And this really scares me. When combined with CRDT's, it effectively gives malicious users the ability to rewrite the past. This is not what you would expect from reading Orbit DB's documentation about eventually converging on a truth. At no time does any part of the past get stable.

OrbitDB's head exchange (a part of each sync) is completely unsigned. I could force anyone with an active DB to download giant movies instead, or CP, or just DOS them.

Hmm interesting. If you've found a real security hole, shouldn't be too hard to fix. The orbit-db team has been receptive to criticism and fixes. I've got a few open PRs and issues with them right now.

Yeah, I filed a PR [correction: Issue].

To be clear though, because heads are only exchanged with directly connected peers, you would have to explicitly connect with a malicious peer in order to be vulnerable to this. This shouldn't be an issue with Origin's implementation where peers communicate with a trusted server rather than directly with one another.

Am I misunderstanding Orbit DB's communications? A Orbit DB peer connects to an IPFS PubSub server. It announces itself to a room based on the name of the DB. Then Orbit DB establishes a "direct connection" to all already online peers of that database via creating a room per peerA/peerB combination. OrbitDB uses these rooms to chat. Thus, every online DB using the same IPFS PubSub server is "directly connected" in Orbit terms. Or am I missing something?

Also, is Origin messaging vulnerable to this? Would be curious to see a demonstration of an exploitation of this.

Any OrbitDB, anywhere, should be vulnerable. If I have some extra time next week, I can PoC it.

Well, maybe a malicious IPFS peer could sign other servers up for subscriptions they didn't sign up for, or maybe unsubscribe them without them knowing, send a DDOS through other servers that looks like it is coming from a yet another server...

I'm not convinced any of those attacks are possible just based on what you've said here. Can you elaborate more, or demonstrate this?

Your idea of making a master database program that automatically opens DB's is an improvement to Orbit DB's default.

I actually got the idea from 3box and generalized it a bit.

JS-IPFS storing everything by default is a security hole and an operational nightmare. I can easily get the server to run out of disk by making it request things, and when you want to clean up my mess you don't have an easy way to sort out the important Orbit DB messages from my spam, since they are all not pinned.

Yeah again they will be introducing real pinning at some point. But can you clarify how this is different from Origin's IPFS server which pins anything upon request? Is that not vulnerable to the same attacks? Or for that matter, any traditional database server which writes data upon request?

And this really scares me. When combined with CRDT's, it effectively gives malicious users the ability to rewrite the past. This is not what you would expect from reading Orbit DB's documentation about eventually converging on a truth. At no time does any part of the past get stable.

Maybe the orbit-db docs can be improved. But there's nothing scary about it, as long as you treat an orbit-db database as someone's personal wall they can post things to, rather than a source of truth or consensus. (Maybe my usage of the word "truth" earlier was confusing; the eventual value of a database is a convergent state, not any sort of verified truth.)

Yeah, I filed a PR.

You mean an issue? (I saw the issues you posted.) Don't see any PRs from you.

Am I misunderstanding Orbit DB's communications? A Orbit DB peer connects to an IPFS PubSub server. It announces itself to a room based on the name of the DB. Then Orbit DB establishes a "direct connection" to all already online peers of that database via creating a room per peerA/peerB combination. OrbitDB uses these rooms to chat. Thus, every online DB using the same IPFS PubSub server is "directly connected" in Orbit terms. Or am I missing something?

To the best of my understanding, no that is not how it works. The term "room" here is misleading. If I connect to server-A in room-1, and you do the same, we will both be connected to server-a but will not be connected to one another.

Any OrbitDB, anywhere, should be vulnerable. If I have some extra time next week, I can PoC it.

That would be really helpful.

Also, is Origin messaging vulnerable to this? Would be curious to see a demonstration of an exploitation of this.

Good news: So I tested this out. Orbit DB does auth head exchanges! Yay! This means that only the writer of a database can mess things up for readers.

Ugly news: This limits your attack targets to the Origin messaging server and anyone you message. On profiles it would limit it to anyone looking at your profile and the profile server(s). Oh, and the current messaging server depends on a world-writable database that is read by everybody, so we are are toast currently.

Bad News: You get about a 1 to 20,000 bandwidth multiplication ratio when DOSing by adding next pointers to IPFS objects filled with random data. You upload the random data objects once, and then can use them for as many separate attacks as you like. Every address you add to your evil Orbit DB entry next array makes the target download 1,000,000 bytes. And if they aren't running a local IPFS server, they download this every time they open the database. And if they are running a local IPFS server, you get get to fill it with bogus data.

So because of our world writable OrbitDB database in our messaging, the forces of evil could add a single entry to that registry and every single open of that database, by every single user, for all time afterward, would start downloading approximately 20 gigabytes.

If you just care about DOSing bandwidth, and not diskspace, and are targeting a browser using a non-local IPFS server, you don't even have to upload a pile of data: you can just reuse the same IPFS hashes in multiple entries. OrbitDB will download, then reject them for not being signed, forget it ever knew about them, and then download them again the next time it encounters it in an entry.

But can you clarify how this is different from Origin's IPFS server which pins anything upon request?

We shouldn't be pinning everything on request. What we should be doing is following the event listener and pinning valid things related to valid entries. This is tremendously more DOS resistant because people have to pay ETH money to create these listings/offers/etc that we pin. We should also control the sizes and number of things that pin per listing/offer.

Maybe the orbit-db docs can be improved. But there's nothing scary about it, as long as you treat an orbit-db database as someone's personal wall they can post things to.

I agree that a-wall-that-a-single-person-can-post-on-is what Orbit DB, in the current reality. That's not really what it's presented as though. From reading the docs you would think it was a log, not an arbitrary arrangement of data, and OrbitDB talks about supports and sometimes encourages multiple people writing to a shared log, and it doesn't mention that if you are a writer it's trivial to DOS a reader.

The term "room" here is misleading. If I connect to server-A in room-1, and you do the same, we will both be connected to server-a but will not be connected to one another.

I'll have to test it. Thanks.

In testing, it looks like Orbit DB does create roughly N² subscriptions for N IPFS peers on the same database.

You can test it by running multiple copies of the sample code below, just change ports and run from a a different directory for each time you launch:

const IPFS = require('ipfs')
const OrbitDB = require('orbit-db')

const ipfsOptions = {
  EXPERIMENTAL: {
    pubsub: true
  },
  config: {
    Addresses: {
        Swarm: [
          '/ip4/0.0.0.0/tcp/4062',
          '/ip4/127.0.0.1/tcp/4063/ws'
        ],
        API: '/ip4/127.0.0.1/tcp/5062',
        Gateway: '/ip4/127.0.0.1/tcp/9160'
      },      
  },
  repo: '.'
}

// Create IPFS instance
const ipfs = new IPFS(ipfsOptions)

ipfs.on('error', (e) => console.error(e))
ipfs.on('ready', async () => {
  const orbitdb = new OrbitDB(ipfs)

  // Create / Open a database
  const db = await orbitdb.log('DBNAME')
  await db.load()

  // Listen for updates from peers
  db.events.on('replicated', (address) => {
    console.log("replicated")
  })

  // Add an entry
  const hash = await db.add('world')
  console.log(hash)

})

// Show this peer's active subscriptions
setInterval(async () => {
    console.log("-----")
    console.log(await ipfs.pubsub.ls())
}, 2000);

Closing this now. Merry Christmas Eve(almost)!

OriginProtocol / origin

Discussion: orbit-db concerns #1021