WebOfTrust / keripy

Key Event Receipt Infrastructure - the spec and implementation of the KERI protocol
https://keripy.readthedocs.io/en/latest/
Apache License 2.0
56 stars 53 forks source link

Migration bug on qnfs database key lookup - ERR: invalid literal for int() with base 16: b'EA59RST48MLqNoh_jNxHysEXFL4PeQlj2mII0TVEdz64' #863

Open kentbull opened 3 days ago

kentbull commented 3 days ago

Version

1.2.0rc2

Environment

Mac OS, Python 3.12.3, local deployment

Expected behavior

Starting up kli witness demo after performing kli migrate run for each witness should bring up the demo witness set.

Actual behavior

The following error occurs:

$ kli witness demo
Witness wan : BBilc4-L3tFUnfM_wJr4S4OJanAv_VmF_dJNN6vkf2Ha
ERR: invalid literal for int() with base 16: b'EA59RST48MLqNoh_jNxHysEXFL4PeQlj2mII0TVEdz64'

The bug is that the splitOnKey in LMDBer.getTopIoSetItemIter is using splitOnKey which expects the second part of the tuple to be a base16 serial number yet the key that is put in there is a dgKey, or a (pre, said) tuple.

class LMDBer(filing.Filer):
    ...
    def getTopIoSetItemIter(self, db, top=b'', *, sep=b'.'):
        ...
        for iokey, val in self.getTopItemIter(db=db, top=top):
            key, ion = splitOnKey(iokey, sep=sep)
            yield (key, val)

def splitOnKey(key, *, sep=b'.'):
    ...
    if isinstance(key, memoryview):
        key = bytes(key)
    top, on = splitKey(key, sep=sep)
    on = int(on, 16)
    return (top, on)

class Kevery:
    ...
        def processQueryNotFound(self):
        ...

        key = ekey = b''  # both start same. when not same means escrows found
        pre = b''
        sn = 0
        while True:  # break when done
            for (pre, said), edig in self.db.qnfs.getItemIter(keys=key):

Steps to reproduce

  1. Check out keripy repo and change to the keripy directory
  2. git switch v1.1.18
  3. python3 -m pip install -e ./
  4. kli witness demo
  5. `cd scripts/demo && source demo-scripts.sh
  6. Modify the ./scripts/demo/basic/delegate.sh to exit prior to the call on line 12 to kli delegate confirm --name delegator --alias delegator -Y & which will cause a query not found message to remain in that escrow, the hab.db.qnfs database.
  7. Run the script in `./scripts/demo/basic/delegate.sh
  8. git switch v1.2.0rc2
  9. python3 -m pip install -e ./
  10. Upgrade all of the witnesses:
    • One liner to migrate all six demo witnesses:
    • list=(wan wes wil wit wub wyz); printf '%s\n' "${list[@]}" | xargs -I {} kli migrate run --name {}
  11. kli witness demo then you will get a value error like:
ValueError: invalid literal for int() with base 16: b'EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt'

This will stop your witnesses from starting.

kentbull commented 3 days ago

The modified delegate.sh script (exit before delegation confirmation) was key to reproducing this bug. Yet it may not be that important of a bug. It only occurs if you happen to have a message in the hab.db.qnfs database prior to upgrading from 1.1.18 to 1.2.0rc2. This occurs because in 1.1.18 the blank serial number (all zeroes) is not appended to the key used to look up an entry from the .qnfs database - the key looks something like this (note the periods) which fails the type conversion in splitOnKey:

EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt

Whereas in v1.2.0rc2 there is a set of zeros appended on the end that allows the string to pass the type conversion in splitOnKey:

EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.00000000

While this does prevent a witness from starting and thereby could be a major problem for someone there is also the possibility of updating the migration script to add the number suffix on any entry in the .qnfs database.

kentbull commented 2 days ago

From our spec call today we discussed that it makes the most sense to not include escrows during an upgrade.

(Phil) Specifically to clear out escrow databases at the beginning of a migration

(Kevin) It needs to be updated, the current escrow "clearing" is insufficent

            hby.db.gpwe.trim()
            hby.db.gdee.trim()
            hby.db.dpwe.trim()
            hby.db.gpse.trim()
            hby.db.epse.trim()
            hby.db.dune.trim()