Open OttoAllmendinger opened 5 years ago
I'd propose a slight modification of the above/further detail the spec:
We should not store the txid and the blockHash
as string
s. Should use BLOB
in SQL and a byte slice in Go instead. The hashes are (next to public keys) the least compressible data used in Bitcoin. SQLite supports BLOB
as primary keys. PostgreSQL can handle byte value primary keys as well. This will probably reduce the database size quite a bit.
We should maybe pass well defined structs to functions such as addTransaction()
. See draft definitions in https://github.com/0xB10C/bademeister-go/issues/5#issuecomment-546121120.
I'd opt to directly use fee
rather than inputAmount
, but no strong preference besides fee
probably being a bit smaller than inputAmount
in terms of storage size required (since SQLite stores things as variable length binary data).
Using uint32
for satoshi amounts in Bitcoin does not work. Needs to be uint64
.
Timestamps should be in UTC to avoid confusion down the line.
[32]byte
(we'll probably introduce a type alias)inputAmount
with fee
, use uint64
I deliberately described the data fields instead of a struct here so we get an idea of what is actually stored. We can have a convenience AddTransaction()
method that accepts the wire
type. However we should be careful that we don't mix it up with the Transaction
type that will result from reading from our storage, which will be very minimal.
Implemented in #14
(txid, firstSeen)
in table transactions
Todo:
The firstSeen
timestamp is currently written into the database as a DATE
. DATE
s are handled by SQLite internally as TEXT
(as ISO8601 strings YYYY-MM-DD HH:MM:SS.SSS
). This has a way higher precision than we need and is generally quite storage space inefficient. Better would be to store the firstSeen
timestamp in Unix time as INTEGER
which takes a maximum of 8 bytes. Unix timestamps are supported by the SQLite time and date functions. Additionally INTEGERS
should be even better compressable than TEXT
.
See https://www.sqlite.org/datatype3.html for more info on datatypes.
INTEGER
and save as Unix timestamp.I did a quick test with TEXT
(v1) and INTEGER
(v2) timestamps. A database with 800k tx (~2 days) without removed txids is:
143M Oct 28 23:17 v1.db
125M Oct 28 23:17 v2.db (12.5% smaller)
With txids removed:
34M Oct 28 23:25 v1.db
17M Oct 28 23:24 v2.db (50% smaller)
Code used:
#!/usr/bin/env python
import sqlite3
import datetime
import time
from hashlib import sha256
connv1 = sqlite3.connect('v1.db')
connv2 = sqlite3.connect('v2.db')
cv1 = connv1.cursor()
cv2 = connv2.cursor()
cv1.execute('CREATE TABLE tx (txid blob unique, firstSeen date)')
cv2.execute('CREATE TABLE tx (txid blob unique, firstSeen int)')
for i in range(800000):
txid = sha256(str(i)).hexdigest()
firstSeen = datetime.datetime.now()
cv1.execute("INSERT INTO tx ('txid', 'firstSeen') VALUES (?, ?)", (txid, firstSeen))
cv2.execute("INSERT INTO tx ('txid', 'firstSeen') VALUES (?, ?)", (txid, int(time.time())))
connv1.commit()
connv2.commit()
connv1.close()
connv2.close()
I though a bit about the SQL table structure. This is all still in a WIP state and can be changed. I just wanted to write it down somewhere.
Currently the table where transactions are store is called mempool_tx
. The table name should be changed to transaction
. The table hold transactions included in blocks and in the mempool. Transactions in the mempool can be identified by not having a entry in the transaction_block
table, which is discussed below. The table is missing an id
as INTEGER PRIMARY KEY
. By default the column INTEGER PRIMARY KEY
is used as an auto-increment ROWID
by SQLite. The id
is used as foreign key in tables like transaction_block
or transaction_ancestor
. This saves a lot of disk space compared to writing the txid and block hash to the table.
Additionally some of the table colums are INT
and can be changed to INTEGER
for consistency. For SQLite INTEGER
and INT
are the same datatype.
-- Table: transaction
CREATE TABLE "transaction"
(
id INTEGER PRIMARY KEY UNIQUE NOT NULL,
txid BLOB (32),
first_seen INTEGER,
fee INTEGER,
weight INTEGER
);
The table block
doesn't exist yet. The current form is quite minimalist and can be expanded. Similar to the transaction
table it has an id
.
-- Table: block
CREATE TABLE block
(
id INTEGER PRIMARY KEY UNIQUE NOT NULL,
hash BLOB (32) UNIQUE,
first_seen INTEGER,
height INTEGER
);
The table transaction_block
does not exist yet. It's used to save the relationship between a transaction
and a block
it's included in. In our model a transaction purposefully can be included in multiple blocks to account for reorgs. We still need a way to identify the most-work-chain. Each entry consists of a transaction_id
and block_id
as foreign keys to the respective transaction or block. Additionally a block_index
is included which represents the index/position of the transaction in the block.
-- Table: transaction_block
CREATE TABLE transaction_block
(
transaction_id INTEGER REFERENCES "transaction" (id) NOT NULL,
block_id INTEGER REFERENCES "block" (id) NOT NULL,
block_index INTEGER NOT NULL
);
The table transaction_ancestor
helps tracking CPFP relationships and transaction packages. It represents a reference from a transaction A (current) to a transaction B (ancestor), where A spends an output from B. Only packages of (at the time of insertion) unconfirmed transactions should be inserted here. (https://github.com/0xB10C/bademeister-go/issues/23)
-- Table: transaction_ancestor
CREATE TABLE transaction_ancestor
(
current INTEGER REFERENCES "transaction" (id) NOT NULL,
ancestor INTEGER REFERENCES "transaction" (id) NOT NULL
);
How to store RBF'ed transactions is still open. A single RBF transaction can replace multiple other transactions. Should have another look once we are sure where we get the data from and what data we get. (https://github.com/0xB10C/bademeister-go/issues/24)
My suggestion for the high-level interface:
addTransaction(txid: [32]byte, firstSeen: Time, fee, outputAmount: uint64, vSize: uint32)
fee
is a bit expensive to get via RPC we collect it hereaddConfirmation(txid: [32]byte, blockHash: [32]byte, time: Time)