CounterpartyXCP / counterparty-core

Counterparty Protocol Reference Implementation
http://counterparty.io
MIT License
283 stars 206 forks source link

Excessive data duplication. #1341

Open blocklack opened 6 months ago

blocklack commented 6 months ago

There are no relationships between tables to avoid duplicate data.

Checking the tables that exist in the database I found one called addresses, actually I did not find any SQL instruction that creates this table, I only found references where the table is queried, it contains a few records, maybe this table has been created in some previous version of counterparty-lib.

My proposal is to complete the appropriate relationships to improve query times and avoid data duplication.

There are some tables that are properly related

  1. Use the existing tables, Address , Asset, Blocks and create relationships with the following columns to the appropiate table. column -> TableToBeRelated.
    • Balances
      • address -> Address
      • asset -> Asset
    • Credits
      • address -> Address
      • asset -> Asset
      • Debits
        • address -> Address
      • asset -> Asset
      • BetExpirations
        • source -> Address
      • Bets
        • source -> Address
        • feed_address -> Address
        • block_index -> Blocks
      • Broadcast
        • source -> address
        • block_index -> Blocks
      • BTCPays
        • source -> Address
      • Burns
        • source -> Address
        • block_index -> Blocks
      • Cancels
        • source -> Address
        • block_index -> Blocks
      • Credits
        • source -> Address
        • asset -> Asset
      • Destructions
        • source -> Address
        • asset -> Asset
        • block_index -> Blocks
      • DispensersRefills
        • source -> Address
        • destination -> Address
        • asset -> Asset
        • block_index -> Blocks
      • Dispensers
        • source -> Address
        • asset -> Asset
        • origin -> Address
        • oracle -> Address
        • block_index -> Blocks
      • Dispenses
        • source -> Address
        • destination -> Address
        • asset -> Asset
        • block_index -> Blocks
      • Dividens
        • source -> Address
        • asset -> Asset
        • block_index -> Blocks
      • Issuances
        • source -> Address
        • issuer -> Address
        • asset -> Asset
        • block_index -> Blocks
      • Order Expirations
        • source -> Address
      • Orders
        • source -> Address
        • give_asset -> Asset
        • get_asset -> Asset
        • block_index -> Blocks

Along with the appropriate table relationships you should aim to extract the SQL logic out of the all modules into one, as it is very difficult to debug where the code reads/writes to the database.

jotapea commented 6 months ago

@blocklack FYI: https://github.com/CounterpartyXCP/cips/discussions/109#discussioncomment-8087724

But I do agree that maybe some like address and tx_hash could use primary key references. But is also not a big deal for such a small DB overall as @adamkrellenstein explains.

IMO the biggest problem with data duplication is the issuances description. The protocol defaults to copying the data even if is not part of the bitcoin transaction. Thus, sweeps can potentially become VERY heavy on the DB with a small bitcoin transacion.

blocklack commented 6 months ago

I understand that the priority at the moment is fixing other things, but I totally disagree that this is not a critical part of the architecture, we understand that storage is cheap but the bigger the file the longer it takes the CPU to read the data and write 3 times in a just tx. keeping the tables well related allows for more optimized and complex queries.

I give an example that allows us to see how frequent is the duplication of data (This is an i/O process and in theory should not be duplicated.)

let's review this send transaction. tx_hash: 7e6053c8a0759dab993a9bba9d3997f3a5f36d9b30e99e56013123519977d6e6

Table Send =

    {
        "tx_index": 1471622,
        "tx_hash": "7e6053c8a0759dab993a9bba9d3997f3a5f36d9b30e99e56013123519977d6e6",
        "block_index": 605716,
        "source": "1AhAExgxS6aVRdKdyEuC5M4v6dxdzdgTaq",
        "destination": "1Cu9ksPUTvZZa9duDtSTq2BejWSDZp7zHM",
        "asset": "XCP",
        "quantity": 22017900000000,
        "status": "valid",
        "msg_index": 0,
        "memo": null
    },

Table Debits

    {
        "block_index": 605716,
        "address": "1AhAExgxS6aVRdKdyEuC5M4v6dxdzdgTaq",
        "asset": "XCP",
        "quantity": 22017900000000,
        "action": "send",
        "event": "7e6053c8a0759dab993a9bba9d3997f3a5f36d9b30e99e56013123519977d6e6"
    }

Table Credits

    {
        "block_index": 605716,
        "address": "1Cu9ksPUTvZZa9duDtSTq2BejWSDZp7zHM",
        "asset": "XCP",
        "quantity": 1897640704,
        "calling_function": "send",
        "event": "7e6053c8a0759dab993a9bba9d3997f3a5f36d9b30e99e56013123519977d6e6"
    }

Data that is written 3 times in the database, source, destination, asset, address, block_index

If counterparty scaled and the masses adopted it we would be talking about 500k Sends per day (in ETH there are 1.5M transactions per day).

70 bytes duplicated 500.000 Sends per day = 35.000.000.000 (35MB approx) 35 365 = 12.775 MB.

in SENDs alone 12GB of growth saved if we create good relationships.

Now if we do the calculations in dispensers, bets, orders. we would be talking about more GB per year. but i understand that storage is cheap. what is not cheap is that you are putting in the same node, Bitcoind (++700gb), addressindexrs, and counterparty-lib and consuming ram memory and cpu to do I/O operations on those files.