garethjns / PyBC

Bitcoin blockchain parser for Python 2 and 3. Includes handy examples.
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

CVarInt format is not de-serialised correctly #5

Open garethjns opened 5 years ago

garethjns commented 5 years ago

The core wallet serialises larger (?) integers into the CVarInt format which is not read correctly by pybit.py3.common.Common. Any time a CVarInt is parsed, it returns as a massive number that breaks everything downstream of it.

https://bitcoin.stackexchange.com/questions/51620/cvarint-serialization-format

    def read_var(self,
                 pr: bool=False) -> bytes:
        """
        Read next variable length input. These are described in specifiction:
        https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer
        Retuns output and number of steps taken by cursor
        """
        # For debugging
        start = self.cursor

        # Get the next byte
   # CVarInt needs to be detected here?
        by = self.read_next(1)
        o = ord(by)

        if pr:
            print(by)

        if o < 253:
            # Return as is
            # by is already int here
            out = by

        elif o == 253:  # 0xfd
            # Read next 2 bytes
            # Reverse endedness
            # Convert to int in base 16
            out = self.read_next(2)

        elif o == 254:  # 0xfe
            # Read next 4 bytes, convert as above
            out = self.read_next(4)

        elif o == 255:  # 0xff
            # Read next 8 bytes, convert as above
            out = self.read_next(8)

        if pr:
            print(int(codecs.encode(out[::-1], "hex"), 16))

        return out
papipig commented 2 years ago

Two things that may help:

  1. There are two different int compression methods : CCompactSize and CVarInt. I think what your want is the first one. CVarInt is not used in block dat files.

See Peter Wuille's answer here: https://bitcoin.stackexchange.com/questions/51620/cvarint-serialization-format Saying : "Still wrong. He's asking about the CVarInt used in UTXOs internally in Bitcoin Core. Not the variable-length push in scripts, or the CCompactLen used in the P2P protocol."

  1. I think the problem is that .dat files may be corrupted.

See Peter Wuille's answer here.

https://bitcoin.stackexchange.com/questions/86159/differences-to-ccompactsize-and-cvarint

"Blocks have well defined data only. But the files can contain garbage in addition to the blocks. There is an index database that stores which position each block is stored at."

I think the right solution would be to parse LevelDB files inside directory index/ to get exact location of blocks in .dat files.

You can get offset in file using thiese db : https://bitcoin.stackexchange.com/questions/28168/what-are-the-keys-used-in-the-blockchain-leveldb-ie-what-are-the-keyvalue-pair

(be careful at the end of the thread though, it is misleading)

garethjns commented 2 years ago

Hi @papipig, thank you very much for the info and links!