IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
290 stars 161 forks source link

Hash mismatch pool_offline_fetch_error #1099

Closed Elitestakepool closed 2 years ago

Elitestakepool commented 2 years ago

Experiencing Hash mismatch across pools in 12.0.2 but not with 12.0.0, checked hash of files for correctness. decided to roll back one of the servers to 12.0.0 and issue fixed.

Example pool_id = 7106

DBSync version = 12.0.02 Hash mismatch from when fetching metadata from https://elitestakepool.com/pools/elite2/poolmeta.json. Expected 8842d21a936fc0cae8ba04a4f1cd0ce223fad514778d0f77034b61fdfbac5484 but got bf2e8c1961cd0de5b8028110693bc882fa200bedd22b5135f59a1db1be04dc7b

pool_id not visible in pool_hash table

Newly built DBSync version = 12.0.0 Metadata hashes resolved and pool_id visible in pool_hash table

Ran this to compare both server count

SELECT count(1)
FROM pool_offline_fetch_error pr
WHERE fetch_time = (select max(fetch_time) from pool_offline_fetch_error as pe where pe.pool_id = pr.pool_id)
AND fetch_error LIKE 'Hash mismatch%';

12.0.0 count: 757 12.0.2 count: 1219

Because I've validated the hashes match, I wanted to check to see if this is a bug/ issue before resubmitting metadata refresh to blockchain to see if DBSync resolves for 12.0.2, there is still a greater amount of Hash mismatch reported in 12.0.2 which is strange.

erikd commented 2 years ago
12.0.0 count: 757
12.0.2 count: 1219

that has nothing to do with the different versions. Different instances of db-sync will end up with different entries in that table.

As for the hash mismatch, that is likely to be a transitory error with that specific pool and completely outside the control of db-sync. Also switching back to version 12.0.0 did not "fix" this, it probably just means that pool owner fixed the bad metadata. If fact, switching back to 12.0.0 means that you are now missing two separate fixes related to pool metadata, that were added between 12.0.1 and 12.0.2.

This is not a bug or issue of any sort with db-sync, which is operating as it was intended to.

Elitestakepool commented 2 years ago

Acknowledge the differences in count and thank you for the reasoning behind the count.

@erikd, the pool in question, is my pool and I haven't touched the metadata in a year (05/21/2021). My pool has been displaying correctly in all wallets (Daedalus, Yoroi) as well as 3rd party tools (poolstats, adapools, pooltool) as well as correctly displaying indb-sync 12.0.0 but not db-sync 12.0.2

erikd commented 2 years ago

My pool

The pool_id may not be unique across db_sync instances.

However, looking at the metadata at https://elitestakepool.com/pools/elite2/poolmeta.json, suggests your pool has a ticker name of ELITE. My db-sync instance has two entries with that ticker name:

cexplorer=# select id, pool_id, ticker_name from pool_offline_data where ticker_name = 'ELITE' ; 
  id   | pool_id | ticker_name 
-------+---------+-------------
  1157 |    7106 | ELITE
 16424 |      47 | ELITE
(2 rows)

My instance is running a version from the master branch which is basically version 12.0.2 plus a few experimental patches.

Elitestakepool commented 2 years ago

that's correct, we have two pools under ELITE with those pool_id's

Interesting why your version 12.0.2 has correctly resolved both pools and so does my version 12.0.0 but the version of 12.0.2 I'm running is reporting hash mismatch ¯\_(ツ)_/¯

Would you advise rebuilding 12.0.2 from scratch and seeing if the same error occurs? maybe go for master branch!

erikd commented 2 years ago

Please run the same query I used on both your instances:

select id, pool_id, ticker_name from pool_offline_data where ticker_name = 'ELITE' ;
Elitestakepool commented 2 years ago

Here you go @erikd

run on 12.0.2

cexplorer=# select id, pool_id, ticker_name from pool_offline_data where ticker_name = 'ELITE' ;
 id  | pool_id | ticker_name
-----+---------+-------------
 780 |      47 | ELITE
(1 row)

run on 12.0.0

cexplorer=# select id, pool_id, ticker_name from pool_offline_data where ticker_name = 'ELITE' ;
  id   | pool_id | ticker_name
-------+---------+-------------
  1026 |    7106 | ELITE
 12942 |      47 | ELITE
(2 rows)
erikd commented 2 years ago

I assume these instances are being run on different machines. What is the output of the following curl commands on those two machines:

curl https://elitestakepool.com/pools/elite2/poolmeta.json

and then:

curl --silent https://elitestakepool.com/pools/elite2/poolmeta.json | md5sum
Elitestakepool commented 2 years ago

yes, these are completely separate instances of DBSync

Server Specs:

12.0.2

curl https://elitestakepool.com/pools/elite2/poolmeta.json
{"name":"Elite Stake Pool 2","ticker":"ELITE","description":"Proud to be part of the Cardano community entrusted with securing the Cardano network","homepage":"https://elitestakepool.com","nonce":"1609089483","extended":"https://elitestakepool.com/pools/elite2/poolextended.json"}
curl --silent https://elitestakepool.com/pools/elite2/poolmeta.json | md5sum
727c2fc38e152c74d8beea2151d43ac3  -

12.0.0

curl https://elitestakepool.com/pools/elite2/poolmeta.json
{"name":"Elite Stake Pool 2","ticker":"ELITE","description":"Proud to be part of the Cardano community entrusted with securing the Cardano network","homepage":"https://elitestakepool.com","nonce":"1609089483","extended":"https://elitestakepool.com/pools/elite2/poolextended.json"}
curl --silent https://elitestakepool.com/pools/elite2/poolmeta.json | md5sum
727c2fc38e152c74d8beea2151d43ac3  -
Elitestakepool commented 2 years ago

@erikd with your 12.0.2 did you upgrade from 12.0.0 or perform a resync from scratch, I performed a resync from scratch on 12.0.2

Elitestakepool commented 2 years ago

I can try flattening the server a rebuild with 12.0.1 to see if this works, if so, I can then flatten again and try 12.0.2 (again).

Happy to do some testing, we have a working DBsync so the other can be a test bed. Open to other idea's!

erikd commented 2 years ago

I do not have an real 12.0.2 instance. What I have is something that is 12.0.2 plus some extra patches. It was synced from scratch. Your instance running 12.0.2 should retry that metadata fetch. Not sure how that happened.

rdlrt commented 2 years ago

@Elitestakepool - A lot of us are running 12.0.2 at koios and both your pools show up fine (I do not see the version itself being the issue)

Elitestakepool commented 2 years ago

@Elitestakepool - A lot of us are running 12.0.2 at koios and both your pools show up fine (I do not see the version itself being the issue)

Did you upgrade/ migrate to 12.0.2 or was this a fresh sync of the DB? I know I can compile 12.0.2 and import a DB snapshot from the repo and that will work. I just wanted to find the root cause from a fresh sync to see if there was an underlying issue.

Elitestakepool commented 2 years ago

hould retry that metadata fetch.

the retry keeps reporting hash mismatch every 24 hours... weird

Elitestakepool commented 2 years ago

I'll do a DB import from the repo and this issue will go away, thanks for investigating guys 👍

rdlrt commented 2 years ago

Did you upgrade/ migrate to 12.0.2 or was this a fresh sync of the DB?

For 3 of many instances that I know, 2 were done using snapshot, while 1 didn't use snapshot

erikd commented 2 years ago

The retry should happen every 24 hours. It should not be failing. The code to grab the metadata has not changed from 12.0.0 to now.

Elitestakepool commented 2 years ago

If no-one else is experiencing this issue then it must be my side, I'll fatten and rebuild.