Blockchain-Technology-Lab / consensus-decentralization

Tool that analyzes blockchain decentralization on the consensus layer by looking at the block production distributions of various blockchain systems.
https://blockchain-technology-lab.github.io/consensus-decentralization
MIT License
25 stars 6 forks source link

Difficulty Adding New Pool Clusters #143

Closed Cerkoryn closed 6 months ago

Cerkoryn commented 6 months ago

I've started working on trying to validate and reproduce some of the results from this tool against other community tools such as Cexplorer and Balance. Particularly the large gap between the reported Nakamoto Coefficient of ~58 reported by the Blockchain Technology Lab's Dashboard and the community-reported MAV (Minimum Attack Vector) of ~30.

image image

My initial thoughts are that the gap is explained by the pool cluster data for Coinbase/Avengers not being included within this tool, yet they are the largest single cluster on the other community tools. To validate this hypothesis, I tried to add the cluster into this tool by referencing the Balance API for complete list of pools and then referencing it against cardano_raw_data.json using the pool_ticker to get the rewards addresses.

image

However, due to some of the pool tickers being N/A or missing from the data set entirely, I was only able to account for 24 of the 45 pools in that cluster. This already led to a significant reduction in the MAV/Nakamoto Coefficient from ~58 to 48.

image

I'm trying to account for the remaining 21 pools in this cluster and then move on to other clusters, but I don't seem to have a way to link the blocks earned by an N/A pool using something other than the ticker, such as the pool ID or hash. I did my best to try and trace back where the rewards_addresses in the data set come from so I can link them differently, but I don't seem to be able to access the BigQuery data as-described by the docs. Additionally, none of the public APIs like Blockfrost or Maestro seem to have this data as the rewards addresses all start with pool..., stake...., or addr..., but the rewards_addresses in this data set do not. Confusingly, the SQL query also seems to be renaming them from pool_hash to reward_addresses.

This there any other way we can link the correct rewards addresses to a pool without a proper ticker? Is it possible to share or make public the BigQuery data so we can figure out where the rewards addresses come from? Please advise, thank you.

image image

Bez625 commented 6 months ago

I replied on twitter:

I think "rewards_addresses" is misleading - these are actually the hex addresses of the pool. E.g. if you look at your example for easy1 and search that reward_address you get the pool directly, like here on cardanoscan https://cardanoscan.io/pool/20df8645abddf09403ba2656cda7da2cd163973a5e439c6e43dcbea9

I checked one of the N/A pools and it looks to be retired now: https://cardanoscan.io/pool/f747208eb5e3b703b271ff373b4dc4ca643c026071e480689be320d9

LadyChristina commented 6 months ago

Hello and thanks for your interest in our work! One thing to note is that the identifiers file which relies on tickers only includes information for unique tickers, as duplicate tickers cannot be effectively parsed this way. However, there is an additional file for Cardano, where we store cluster information, where the key of each entry is the pool's hash, which uniquely identifies a pool. For the case you are describing, it seems like the best file to tweak is the clusters file, where you can create an entry for each pool, with the following format: "\<pool hash>": {           "cluster": "\<cluster name>",           "pool": "\<pool name>",           "source": "\<source>" } You can also read more about the different mapping approaches we use and the structure of the corresponding files in the documentation pages.

Cerkoryn commented 6 months ago

@LadyChristina thank you, I was able to sort this issue out by using Cardanoscan to map Balance's pool hashes to the correct pool IDs in the raw data.

I also just submitted the script and alternate data in a PR available in PR#145. The alternate grouping data puts Cardano's numbers more in line with other community tools that track the Nakamoto Coefficient/Minimum Attack Vector.