WebOfTrust / keripy

Key Event Receipt Infrastructure - the spec and implementation of the KERI protocol
https://keripy.readthedocs.io/en/latest/
Apache License 2.0
60 stars 55 forks source link

fix: update smids and rmids on rotate #857

Closed kentbull closed 1 month ago

kentbull commented 2 months ago

This updates the database HabitatRecord smids and rmids properties when a group rotation happens.

Fixes https://github.com/WebOfTrust/keripy/issues/856

kentbull commented 2 months ago

It would seem as though doing a name check on BaseHab.save will prevent any updates because the name argument will already be populated in the self.db.names database and thus the self.db.names.get(keys=(ns, self.name)) will always return a value and raise the following value error:

class BaseHab:
    ...
    def save(self, habord):
        self.db.habs.pin(keys=self.pre,
                         val=habord)
        ns = "" if self.ns is None else self.ns
        if self.db.names.get(keys=(ns, self.name)) is not None:
            raise ValueError("AID already exists with that name")

        self.db.names.pin(keys=(ns, self.name),
                          val=self.pre)

It seems like the solution is to have some sort of BaseHab.update that allows updating an existing HabitatRecord only if the record already exists. I will submit that.

SmithSamuelM commented 2 months ago

I am confused by what is happening here:

the habbing.GroupHab.rotate clearly updates the rmids and smids when it is called.


        if serder is None:
            return super(GroupHab, self).rotate(**kwargs)

        if (habord := self.db.habs.get(keys=(self.pre,))) is None:
            raise kering.ValidationError(f"Missing HabitatRecord for pre={self.pre}")

        # sign handles group hab with .mhab case
        sigers = self.sign(ser=serder.raw, verfers=serder.verfers, rotated=True)

        # update own key event verifier state
        msg = eventing.messagize(serder, sigers=sigers)

        try:
            self.kvy.processEvent(serder=serder, sigers=sigers)
        except MissingSignatureError:
            pass
        except Exception as ex:
            raise kering.ValidationError("Improper Habitat rotation for "
                                         "pre={self.pre}.") from ex

        self.smids = smids
        self.rmids = rmids
        habord.smids = smids
        habord.rmids = rmids
        self.db.habs.pin(keys=(self.pre,), val=habord)

        return msg

This called by JoinDoer.rotate which is called by joinDo when the rotation completes sucessfully. So unless there is bug in how the logic for JoinDoer.joinDo and what leads up to that. Then there should be no fix required. I am fearful that this is brute forcing a rotate but without completing the dependencies of verification that have to happen in order for it to be successful.

Since I did not write this code, I am not familiar enough to have a definitive opinion, but I am concerned, joinDo is waiting for all the group multisig events to occur, this seems to be the correct approach. Otherwise one would be brute forcing a rotate that actually was not approved by the group, and therefore one would have to have a way to rollback.

This is antithetical to the architecture of KERI in general. KERI is safe because it doesn't accept or change state until and unless the dependencies are met. It doesn't provisionally change state and then after the fact rollback. I understand that the latter is a common architecture for database based applications, but these are in general insecure, because of that. The former is the proper way to implement state changes that are secure. What this means is that a provisional state of waiting for depedencies to resolve and then possibly retrying interaction to refresh them when they time out or are unavailable is the correct approach. But the rmid, smid record is the state record not the provisional state record.

If the application logic requires provisional state then that is a different database that does not exist yet.

SmithSamuelM commented 2 months ago

A provisional source of truth is not the same as the primary source of truth. Escrows are an example of a provisional source of truth. But escrows are ephemeral so that are meant to buffer the asynchronous nature of faulty unreliable network transports. A non-ephemeral but provisional source of truth usually resides in its own sandbox in the application. This allows some transaction to live persistently attempting to enact a change in the primary source of truth, but that change to the primary source of truth requires the cooperation of other actors in the distributed application space. Thus all the dependencies must be met prior to updating the primary source of truth to reflect a desired provisional source of truth.

Unfortunately many applications builders do not like this extra complication but then they must implement roll backs. The problem is that in distributed applications, its virtually impossible to rollback state that has propagated downstream. Event sourcing is one approach to a universal rollback mechanism, where you delete the database and recreate it by replaying all the events, albeit with the bad events elided. While the KERI design has elements that feel like event sourcing, these do not exist at the application layer, and are not meant to protect from applications that brute force changes in key state or even application state such as the multi-sig group aid state (rmids and smids) Habord record.

Indeed in KERI a memoryless replay of Key Events is called a deletion or partial deletion attack (dead attack). If I have no memory of first seen events, then any compromises of stale key state can be used to create an alternate history that puts the identifiers under the control of the compromiser. So a true rollback is memoryless in that you can roll back to the genisis (inception event) and then replay with an alternate revised set of events. Hence we never want memoryless rollback, which means we don't want rollback, because by definition rollback must be memoyrless or its not rolling back. Memoried rollback is not a thing. Idempotent replay is a thing and that is what KERI supports. Idempotent replay allows distributed verification without trusted third parties. It allows for high availablity and provides a way to detect partial deletion attacks on some of your infrastructure. The important property is that at least one node has to have memory so you can detect that the other nodes were compromised. If all nodes lose memory, then you can have a true rollback, and refresh to a different state with different events. This is a bad idea for security as you have just built your source of truth on a foundation of sand.

What KERI does instead of rollback, it to allow limited recovery, where nothing is forgotten but events can be disputed or superseded and the differences are reconciliable to any validator. But recovery is relatively limited becasue reconciliation would become unmanageable. So KERI goes to great lengths to ensure it was right the first time instead of just brute forcing and then rolling back to fix mistakes.

So when things don't "work" its usually because a required dependency was not met, and the solution is not to ignore the required dependency but to figure out why the dependency was not met and then figure out how to make that dependency more reliably satisfied, not to ignore the dependency.

kentbull commented 1 month ago

Can this be merged now given our offline chat discussing that the smids and rmids are desired state, not final state?