IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
289 stars 162 forks source link

Nomen clature for columns #1138

Open rdlrt opened 2 years ago

rdlrt commented 2 years ago

Hi Team,

We were thinking of adding a CIP for normalising variable names (and types) used at query layers using existing providers as baseline. Since a lot of the tooling extracts information via mini-Protocol or dbsync, would be good to know if there is a naming convention (for object representations like block_no , hash) that is preferred to use as source or if there is flexibility/scope to make things more uniform where possible, couple of examples below:

  1. We have columns like hash for block and transaction itself, but then we have hash_raw/view for objects which often use bech32 representation (pool/stake_addresses), while for address representation , we have address_raw/address for hex/bech32 values.
  2. References like value in (tx_out collateral_tx_out) vs amount in account related tables (withdrawal, reward , reserves, datum) vs quantity in asset representations (ma_tx_mint/ma_tx_out)

There are still patterns here, so these are probably in place based on some logic. Depending on preferences (updates to dbsync vs only restrict ourselves to query layers themselves), we might be able to prepare a better initial draft and add issue/suggestions against appropriate destination repositories.

erikd commented 2 years ago

Totally agree that all the columns named amount and quantity should be renamed value.

Not so sure about the first point (or maybe I am missing the point).

rdlrt commented 2 years ago

Those were just examples, there are a lot more that we would want to perhaps drill down and document , but my question was really if there is a preferred source at the moment where the column names come from (for instance, if there is a relation to the names for dbsync objects being initially built from ledger/consensus code-as-spec), if not - we should be able to brainstorm as ecosystem together across orgs/projects to perhaps have a higher level of consistency for object representation.

About first point, I was referring to use of hash vs hash_raw - which are both similar representation, while using view for bech32 representation compared to use of address vs address_raw for bech32 vs hex representation. One option could be calling all such objects to be of the format $obj_[hash]|[view] . Not proposing this as solution rightaway, our aim is to just start preparing a draft for review with the naming for all objects first that would be commonly used in queries and them see if we can adapt/align better across the ecosystem

ghost commented 2 years ago

Adding these as well: I'd like to suggest to rename tx_in_id and tx_out_id in tx_in and collateral_tx_in table to in_tx_id and out_tx_id

kderme commented 1 year ago

Many of these changes sound reasonable. However, they would be breaking changes for exisisting applications and would require integration.

rdlrt commented 1 year ago

IMO - a breaking change has been long pending, would be nice to reduce some of the backlog on these (will also allow for more freedom with tx_out table updates)?