About the safeness of immudb

zhiqiangxu commented 4 years ago

It seems anyone can use rawsafeset command to set what ever value to what ever key, so if the server is compromised and the hacker can change values easily.

This seems contradicting the claim that :

You can only add records, but never change or delete records.

Is there anything I missed?

vchaindz commented 4 years ago

Thanks for raising that concern! Even if you would "update" a key with a new value, the former value is not getting updated and a new record will be created. Of course you can further secure the immudb communication (mtls, auth) and every client can validate the immudb content.

Regarding rawsafeset: that method is only used to add a non structured value. We never override or delete the key or the value. The method is simply meant to differentiate between structured value and non-structured values when using the immuclient command (as it's not possible to skip the structured data generation without the rawsafeset method).

If someone would compromise the immudb server and start to change existing data the merkle tree consistency would break and every client can detect that. When an authorized malicious client would add a new value to an existing key, the existing history of the key would not be changed or removed. Therefore, our claim, you can only add records, but never change or delete records is true.

zhiqiangxu commented 4 years ago

Thanks for your reply!

I checked the implement in Store.Set and Store.SafeSet , didn't see such logic, immudb just used ordinary badger txn to txn.SetEntry and txn.CommitAt, which will just insert/update the key value pair.

Which one is the immudb's command to add a structured value?

vchaindz commented 4 years ago

absolutely and these questions are very important! The structured data is built client side and the data is marshalled and stored with traditional badger set. client

We decided to keep immudb server endpoint as simple as possible. The exposed server endpoints are simple and not a structured set. On client SDK we build structured data and at the moment the only relevant property is the unix timestamp, but we will add more in the near future like cryptographic signature aso.

&schema.StructuredKeyValue{
        Key: key,
        Value: &schema.Content{
            Timestamp: uint64(c.ts.GetTime().Unix()),
            Payload:   value,
        },
    }
}

Then we marshal the value: proto.Marshal(skv.Value)

The new value is sent to immudb that provides hash calculation and merkletree update. The client exposes a list of simplified methods.

Safe methods contain also the tamperproof consistency checks. The SDK stores the root of the merkletree on client side using built-in inclusion and consistency check. That way developers can simply use immudb like a key value store (similar to redis).

zhiqiangxu commented 4 years ago

Thanks, now I see how the client side built the structured data !

But if someone compromise the immudb server, he could build the structured data the same way, and call SafeSet to change the existing data(which will also update the merkle tree).

I don't see how this claim is guaranteed:

Even if you would "update" a key with a new value, the former value is not getting updated and a new record will be created.

vchaindz commented 4 years ago

you would need to compromise the immudb server and all clients as these store the merkle tree roots as well. Therefore, changing the server, will alert every single client. We're currently working on a diagram to explain that in a better way - I'll share that with you within the next days.

Thanks!

zhiqiangxu commented 4 years ago

That's great, thanks !

vchaindz commented 4 years ago

As promised - here comes the diagram How it works

Thanks!

zhiqiangxu commented 4 years ago

Cool, nice diagram !

I guess the diagram tells that immudb is based on the fact that :

if two clients insert the same key, consistency proof will fail
if two clients insert different keys, consistency proof will pass
if the same client updates the same key, consistency proof will pass

but the reason isn't clear.

vchaindz commented 4 years ago

Our design doesn't allow overwriting any value's history at all because every value update is a new entry in the merkle tree. We don't solely trust the server component (data storage), it's the whole merkle tree consistency check that is done by the client (i. e. your application, when writing data).

Every client checks the whole merkle tree when writing to immudb based on his local root. Therefore if any tampering would happen on the server after the client wrote his data, the consistency would break and all safeGet and safeSet or verification methods will alert.

That also means that the client needs to regularly store the latest merkle tree root to capture all write activities by other clients (and the root change triggered by that).

Some details In immudb every set method produces:

a new entry in BadgerDB. Each key is suffixed with the commit timestamp, to provide multiversion concurrency control. Thus, BadgerDB won't overwrite any data.
a new leaf insertion in merkle tree. Leaf is calulated with the digest -> https://github.com/codenotary/immudb/blob/master/pkg/api/digest.go#L27 of key value inserted in BadgerDB

The insertion protocol After saving a new element the server returns to the client a proof that is used to verify the correct inclusion of the new element. Basically the server sends the client all nodes to simulate a local root generation. The client will use the local value to calculate the starting node hash and then add the nodes provided by the server.

Then the client/auditor verifies (in a similar way) the historical consistency as well, but limited to the age of the previous root he has stored (therefore it's important that every client regularly recalculates the merkle tree root). Currently the client needs to do the client and auditor job, but we're close to implementing the auditor/s as well (they consume all client roots and therefore are always up to date). Currently many consistency checks need to be done by every client, but you can expect many usability improvements soon.

Code Snippets If another client inserts a key that already exists new leaves are created.

func (t *treeStore) worker() {
    //Priority Queue
    pq := make(treeStorePQ, 0, t.cSize)
    for item := range t.c {
        heap.Push(&pq, item)
        t.Lock()
        for min := pq.Min(); min == t.w+1; min = pq.Min() {
            ...
            item := heap.Pop(&pq).(*treeStoreEntry)
            merkletree.AppendHash(t, item.h)
            ...
        }
        t.Unlock()
    }
    ...
}

This is the append method Left side of the tree is considered frozen, only the right side nodes are recalculated, from leaf to the merkle tree root.

When a client tries to retrieve a key we return the last inserted one thanks to the lexicographical order of the LSM tree and as I previously mentioned a proof. pkg/api/digest.go:27 tree.go:64

zhiqiangxu commented 4 years ago

So when a server is compromized, the hacker just has to issue a insert operation to a key that already exists(using client protocol), which will become the last inserted one, it will essentially mutate the value of the key from the client side, even though underneath only a key with newer timestamp is inserted, is it right?

leogr commented 4 years ago

The API does not allow mutating the value of any item. An item is internally composed of (INDEX,KEY,VALUE). INDEX is a monotonically increasing integer that changes on each writing operation. INDEX is also the leaf index in the Merkle tree and the badger timestamp is set to INDEX+1. Basically it's append-only.

That means if a client (or an agent) had cached the Merkle tree hash (ie. the root) that represents the whole state at some point in time (for example after an insertion operation) and then the server got compromised, the server will not able anymore to produce a valid proof for items appended afterward.

Thus, clients and agents will detected the tampering from that point in time and will not consider valid any item inserted after the tampering happened. So even if somebody issues a new insert operation, the last inserted value will not be trusted by clients/agents.

zhiqiangxu commented 4 years ago

The API does not allow mutating the value of any item. An item is internally composed of (INDEX,KEY,VALUE). INDEX is a monotonically increasing integer that changes on each writing operation. INDEX is also the leaf index in the Merkle tree and the badger timestamp is set to INDEX+1. Basically it's append-only.

That means if a client (or an agent) had cached the Merkle tree hash (ie. the root) that represents the whole state at some point in time (for example after an insertion operation) and then the server got compromised, the server will not able anymore to produce a valid proof for items appended afterward.

Thus, clients and agents will detected the tampering from that point in time and will not consider valid any item inserted after the tampering happened. So even if somebody issues a new insert operation, the last inserted value will not be trusted by clients/agents.

Suppose a client had cached the latest Merkle tree hash at leaf INDEX: ROOT(INDEX) , then the server got compromized, and the hacker issues an insert item of (INDEX+1, old_key , new_value), this will update the value of old_key to new_value, the latest root becomes ROOT(INDEX+1), which is the next state of ROOT(INDEX) after insert (INDEX+1, old_key , new_value), so will have no problem generating both inclusion proof for leaf INDEX and consistency proof for ROOT(INDEX).

leogr commented 4 years ago

What do you exactly mean with "the server got compromised"? If you meant somebody unauthorized gets access to the server just to issue an insert (without modifying the pre-existing data by directly accessing the underlying stored data), then you are correct, but:

this kind of attack will not mutate the data nor change the history (that's the claim)
clients are still able to inspect and get the previous values for that key (which can be verified too, ofc) with the provided APIs History and ByIndex respectively and decide what to do so far (it mostly depends on the user application logic, for example, if one might want to always use the first value only other insertions for the same key are irrelevant)
once PKI support is added to immudb (that's planned to be added AFIK) clients should sign their own data and attach the signature alongside the value, so clients and agents will be also able to verify that the data has been inserted by trusted users
generally speaking, although pre-existing data is never mutated implementers have to be aware that:
- multiple values (which can be retrieved by insertion ordered) can be present for the same key then decide what to do with them depending on the specific use case
- immutability itself is not meant to solve the problem of authenticating the data provenance (which can be solved with PKI or other means)

zhiqiangxu commented 4 years ago

Thanks, this clarifies everything !

codenotary / immudb

About the safeness of immudb #176