IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
290 stars 160 forks source link

Epoch 285 catalyst rewards mismatch (bug) #769

Closed xdzurman closed 2 years ago

xdzurman commented 3 years ago

There is a mismatch between cardano-db-sync catalyst rewards and cardano-cli (ledger). The db sync states 2 catalyst rewards which appeared in epoch 285. However, the sooner one of them is invisible to cardano-cli. A few examples:

https://cardanoscan.io/stakekey/05c6aa84de1fd5d5d15fa2cc0f30a5c804f52fb19725525aab54a839?tab=instantaneousrewards Here you can see the two catalyst rewards which appeared 5 days ago. AFAIK cardanoscan simply sums up the rewards minus withdrawals. The reward sum with the two catalyst rewards for this address is 2649355 Lovelaces. According to cardano-cli, "rewardAccountBalance": 2629950. The difference is 19405 Lovelace, which corresponds to the sooner catalyst reward of the two.

Same here - https://cardanoscan.io/stakekey/stake1uyzuqsnynkxkqs848y8mgylywwezrgcrmd6xmqspx4kfk7gqtlzjv?tab=instantaneousrewards Reward sum: 103826843 Cardano-cli reward: 103654676 Diff: 172167 = equal to the sooner of the two rewards

The same goes for basically any address you look up. Db sync version: 9.0.0 This caused Adalite users not being to withdraw rewards until we started filtering out the first of epoch 285's rewards.

erikd commented 3 years ago

This is almost certainly related to https://github.com/input-output-hk/cardano-db-sync/issues/696 .

db-sync version 9.0.* and 10.0.* are known to have issues with rewards, and these are unlikely to be fixed in time for 11.0.0.

xdzurman commented 3 years ago

This was actually caused by faulty catalyst rewards, where in epoch 285, two catalyst rewards showed up for all participants, but the first of the two did not make it through to node and cli, but it appeared in db-sync, which threw balances off. Samuel Leathers can tell you more if interested.

erikd commented 3 years ago

I think I have a fix for #696 (still under test). @xdzurman do you have an specific address with a reward amount for epoch 285 that I can test?

erikd commented 3 years ago

two catalyst rewards showed up for all participants, but the first of the two did not make it through to node and cli, but it appeared in db-sync,

The node uses the same code as db-sync, so how could a reward payment make through to db-sync but not the node ?

xdzurman commented 3 years ago

I think I have a fix for #696 (still under test). @xdzurman do you have an specific address with a reward amount for epoch 285 that I can test?

Let's say stake1u9schrkl2ed8daympxqxd38z72kx00y69ky8udd7g4v53kcp5kkyg. The cardano-cli reward amount was

cardano-cli query stake-address-info --mainnet --address stake1u9schrkl2ed8daympxqxd38z72kx00y69ky8udd7g4v53kcp5kkyg
[
    {
        "address": "stake1u9schrkl2ed8daympxqxd38z72kx00y69ky8udd7g4v53kcp5kkyg",
        "rewardAccountBalance": 10750235,
        "delegation": "pool1j6490a90qefvsqxdurf99z9ld50rtthaz9ms6c8xh78s52veuq2"
    }
]

But if you calculate the rewards from db-sync, the outstanding reward balance is 10793839. The difference is 43604, which corresponds to the first of the two catalyst rewards for epoch 285.

xdzurman commented 3 years ago

two catalyst rewards showed up for all participants, but the first of the two did not make it through to node and cli, but it appeared in db-sync,

The node uses the same code as db-sync, so how could a reward payment make through to db-sync but not the node ?

Quoting Samuel Leathers:

okay, so @jared.corduan had a late night debugging and we see what the issue is. There's actually multiple, most are my fault, one was a misunderstanding in how MIR's worked. To start off, first mistake I made was copy of wrong rewards distributed. My general QA for testing the json file is I check number of lines in it, and then post it to the channel for others to review. No one caught that the total rewards amount was wrong. In the future, I have a script now that sums the rewards and outputs a total value being distributed. Second was I executed the script with wrong pot. This was because of a MIR we did from reserves to account for a bug in rewards calculation prior to allegra. I used the same script, and didn't switch it back from reserves -> treasury. Third is where things get really hairy. I made the false assumption multiple MIR's would accumulate, but in fact they replace (but only prior to alonzo, they accumulate post-alonzo). @jared.corduan figured this out last night comparing the output of his stake address tool to rewards total balance. So essentially, everyone's been paid the second MIR but not the first. I'll be formalizing this into a proper issue, but lots of lessons learned on this one. If we'd like to have a call to discuss it, we can bring @jared.corduan in to explain technically what's happening if this isn't enough detail.

erikd commented 3 years ago

@xdzurman Unfortunately, your cardano-cli query returns an rewardAccountBalance which of course changes over time.

db-sync on the other hand stores credits to the reward account in the reward table and debits in the withdrawal table.

On current master (after merging of the reward validation) I get:

select reward.* from reward inner join stake_address on reward.addr_id = stake_address.id
    where stake_address.view = 'stake1u9schrkl2ed8daympxqxd38z72kx00y69ky8udd7g4v53kcp5kkyg'
    order by earned_epoch ;
    id    | addr_id |   type   | amount  | pool_id | earned_epoch | spendable_epoch 
----------+---------+----------+---------+---------+--------------+-----------------
 13040354 | 1895674 | member   |  353659 |    7699 |          273 |             275
 13651004 | 1895674 | member   |  238593 |    7699 |          274 |             276
 14271526 | 1895674 | member   |  940700 |    7699 |          275 |             277
 14900823 | 1895674 | member   | 1352813 |    7699 |          276 |             278
 15547990 | 1895674 | treasury |   39108 |         |          277 |             279
 15547989 | 1895674 | member   |  493762 |    7699 |          277 |             279
 16210475 | 1895674 | member   | 1371629 |    7699 |          278 |             280
 16867777 | 1895674 | member   | 1236043 |    7699 |          279 |             281
 17533273 | 1895674 | member   | 1004487 |    7699 |          280 |             282
 18204762 | 1895674 | member   | 1112734 |    7699 |          281 |             283
 18883060 | 1895674 | member   |  895610 |    7699 |          282 |             284
 19567131 | 1895674 | member   |  851842 |    7699 |          283 |             285
 20268768 | 1895674 | member   |  845867 |    7699 |          284 |             286
 20268769 | 1895674 | reserves |   13388 |         |          284 |             286
 20986850 | 1895674 | member   |  973891 |    7699 |          285 |             287
 21694133 | 1895674 | member   |  225910 |    7699 |          286 |             288
 23036583 | 1895674 | member   |  518683 |    7699 |          287 |             289
(17 rows)

and

select withdrawal.* from withdrawal inner join stake_address on withdrawal.addr_id = stake_address.id
    where stake_address.view = 'stake1u9schrkl2ed8daympxqxd38z72kx00y69ky8udd7g4v53kcp5kkyg';
 id | addr_id | amount | tx_id | redeemer_id 
----+---------+--------+-------+-------------
(0 rows)

Does that seem correct?

xdzurman commented 3 years ago

Yes, this seems correct. Until now, we queried ITN and catalyst rewards from reserve and treasury tables, but this solves it. If you check rewards for the same address from the reserve table, you can see two entries:

cardano=# select * from reserve where addr_id =1610222;
  id   | addr_id | cert_index | amount |  tx_id   
-------+---------+------------+--------+----------
 54814 | 1610222 |          0 |  19405 | 10923521
 86899 | 1610222 |          0 |   5958 | 10991467

one of which should not be there. That was the whole issue.

xdzurman commented 3 years ago

I think I have a fix for #696 (still under test). @xdzurman do you have an specific address with a reward amount for epoch 285 that I can test?

Also, can you check rewards for stake1ux025zswj96df8esu78wqdhvfc56yg9yzztwhj4hgkp0rqswd7rl7? In db sync 11, all rewards from before epoch 257 are ignored.

Rewards from db sync 9.0.0:

    id    | addr_id | amount | epoch_no | pool_id | block_id 
----------+---------+--------+----------+---------+----------
  1523654 |  303094 |   6724 |      237 |     454 |  5150301
  1620412 |  303094 |   3403 |      238 |    4040 |  5171559
  1721179 |  303094 |   3492 |      239 |    4040 |  5192987
  1825341 |  303094 |   2644 |      240 |    4040 |  5214386
  1936927 |  303094 |   8057 |      241 |    4040 |  5235804
  2055509 |  303094 |   7383 |      242 |    4040 |  5257390
  2183876 |  303094 |   6234 |      243 |    4040 |  5278881
  2321976 |  303094 |   7963 |      244 |    4040 |  5300366
  2468914 |  303094 |   4472 |      245 |    4040 |  5321693
  2619851 |  303094 |   5521 |      246 |    4040 |  5343201
  2783288 |  303094 |   7723 |      247 |    9788 |  5364557
  2960581 |  303094 |   5974 |      248 |    9788 |  5385933
  3159241 |  303094 |   6625 |      249 |    9788 |  5406924
  3378432 |  303094 |   4698 |      250 |    9788 |  5428248
  3621517 |  303094 |   7799 |      251 |    9788 |  5449333
  3885633 |  303094 |     20 |      252 |     454 |  5470255
  4165967 |  303094 |     26 |      253 |    6678 |  5490968
  4461010 |  303094 |     32 |      254 |    6678 |  5512319
  5814262 |  303094 |   3618 |      257 |    7717 |  5576454
  6570265 |  303094 |   1866 |      259 |     454 |  5619113
  6966660 |  303094 |   1264 |      260 |     454 |  5640553
  7376790 |  303094 |    853 |      261 |     454 |  5662037
  7798795 |  303094 |   1477 |      262 |     454 |  5683430
  8232488 |  303094 |    781 |      263 |     454 |  5704839
  8677097 |  303094 |   1033 |      264 |    8956 |  5726575
 14997015 |  303094 |   1521 |      271 |     454 |  5878959
  9594796 |  303094 |   1151 |      265 |    8956 |  5748284
 10066694 |  303094 |    948 |      266 |    8956 |  5770278
 10557469 |  303094 |    627 |      267 |     454 |  5791997
 15586281 |  303094 |      9 |      272 |     454 |  5900696
 11602292 |  303094 |    706 |      268 |     454 |  5813820
 12139677 |  303094 |    719 |      269 |     454 |  5835493
 13257765 |  303094 |    615 |      270 |     454 |  5857193
 16795160 |  303094 |     11 |      273 |     454 |  5922535
 17406737 |  303094 |     10 |      274 |     454 |  5944326
 18655144 |  303094 |     12 |      275 |     454 |  5965898
 19919270 |  303094 |     10 |      276 |     454 |  5987504
 20559745 |  303094 |     13 |      277 |     454 |  6009190
 21210116 |  303094 |     12 |      278 |     454 |  6030813
 21869264 |  303094 |      8 |      279 |     454 |  6052318
 22534301 |  303094 |     10 |      280 |     454 |  6074054
 23206785 |  303094 |      9 |      281 |     454 |  6095819
 23885506 |  303094 |     12 |      282 |     454 |  6117580
 24568320 |  303094 |      8 |      283 |     454 |  6139413
 25259671 |  303094 |      8 |      284 |     454 |  6160945
 25958140 |  303094 |      6 |      285 |     454 |  6182770
 26667651 |  303094 |     13 |      286 |     454 |  6204160
 28118640 |  303094 |     10 |      287 |     454 |  6225656

Rewards from db sync 11.0.0

    id    | addr_id |  type  | amount | pool_id | earned_epoch | spendable_epoch 
----------+---------+--------+--------+---------+--------------+-----------------
  5512286 |  312446 | member |   3618 |    7717 |          257 |             259
  6269795 |  312446 | member |   1866 |     454 |          259 |             261
  6666441 |  312446 | member |   1264 |     454 |          260 |             262
  7075874 |  312446 | member |    853 |     454 |          261 |             263
  7498675 |  312446 | member |   1477 |     454 |          262 |             264
  7932392 |  312446 | member |    781 |     454 |          263 |             265
  8377630 |  312446 | member |   1033 |    8956 |          264 |             266
  8835356 |  312446 | member |   1151 |    8956 |          265 |             267
  9307180 |  312446 | member |    948 |    8956 |          266 |             268
  9798989 |  312446 | member |    627 |     454 |          267 |             269
 10316273 |  312446 | member |    706 |     454 |          268 |             270
 10853716 |  312446 | member |    719 |     454 |          269 |             271
 11409671 |  312446 | member |    615 |     454 |          270 |             272
 11984714 |  312446 | member |   1521 |     454 |          271 |             273
 12575182 |  312446 | member |      9 |     454 |          272 |             274
 13177526 |  312446 | member |     11 |     454 |          273 |             275
 13790404 |  312446 | member |     10 |     454 |          274 |             276
 14412845 |  312446 | member |     12 |     454 |          275 |             277
 15043802 |  312446 | member |     10 |     454 |          276 |             278
 15698747 |  312446 | member |     13 |     454 |          277 |             279
 16357527 |  312446 | member |     12 |     454 |          278 |             280
 17016870 |  312446 | member |      8 |     454 |          279 |             281
 17683530 |  312446 | member |     10 |     454 |          280 |             282
 18356209 |  312446 | member |      9 |     454 |          281 |             283
 19035558 |  312446 | member |     12 |     454 |          282 |             284
 19719657 |  312446 | member |      8 |     454 |          283 |             285
 20430683 |  312446 | member |      8 |     454 |          284 |             286
 21142437 |  312446 | member |      6 |     454 |          285 |             287
 21851842 |  312446 | member |     13 |     454 |          286 |             288
 22575761 |  312446 | member |     10 |     454 |          287 |             289

Could the fix be patch released until the hard fork?

erikd commented 3 years ago

@xdzurman how do you know db-sync-9 was correct? I strongly suspect it was not.

xdzurman commented 3 years ago

That is the version everyone rolled back to after 10.0.0 introduced rewards issues. It has been deployed to production for a long time and stood the test of time (except a minor tweak regarding epoch 285's catalyst rewards).

sorki commented 3 years ago

Running master against testnet, there are couple of these as well and one needs to disable panicAbort to be able to fully sync from scratch

[db-sync-node:Warning:77] [2021-09-08 21:35:31.63 UTC] validateEpochRewards: rewards earned in epoch 84 expected total of 158768.12877 ADA but got 158468.12877 ADA
[db-sync-node:Warning:77] [2021-09-08 21:48:27.42 UTC] validateEpochRewards: rewards earned in epoch 107 expected total of 523029.532638 ADA but got 523129.532638 ADA
erikd commented 3 years ago

@xdzurman

That is the version everyone rolled back to after 10.0.0 introduced rewards issues. It has been deployed to production for a long time and stood the test of time (except a minor tweak regarding epoch 285's catalyst rewards).

It may well have stood the test of time, but it may still be wrong.

erikd commented 3 years ago

Running master against testnet, there are couple of these as well and one needs to disable panicAbort to be able to fully sync from scratch

If the panicAbout is triggered, that is an indication of a bug. That is why it is a panicAbort and not just a info level log message.

sorki commented 3 years ago

Running master against testnet, there are couple of these as well and one needs to disable panicAbort to be able to fully sync from scratch

If the panicAbout is triggered, that is an indication of a bug. That is why it is a panicAbort and not just a info level log message.

Created #805

erikd commented 3 years ago

That is the version everyone rolled back to after 10.0.0 introduced rewards issues. It has been deployed to production for a long time and stood the test of time (except a minor tweak regarding epoch 285's catalyst rewards).

Yes, they may have rolled back to 9.0.0 thinking it was correct, but I am 100% sure it was not and has never been correct.

erikd commented 3 years ago

@xdzurman

select hash_raw from stake_address where view = 'stake1ux025zswj96df8esu78wqdhvfc56yg9yzztwhj4hgkp0rqswd7rl7' ; 
                           hash_raw                           
--------------------------------------------------------------
 \xe19eaa0a0e9174d49f30e78ee036ec4e29a220a41096ebcab74582f182
(1 row)

and then using the stake history tool in the node respository:

./stake-credential-history -c configuration/cardano/mainnet-config.json -s state-node-mainnet/node.socket \
       -t e19eaa0a0e9174d49f30e78ee036ec4e29a220a41096ebcab74582f182 --check-point-size 100000

shows that yes indeed, rewards for this address should start in epoch 238.

I think this is basically a duplicate of #769 .

mmahut commented 3 years ago

That is the version everyone rolled back to after 10.0.0 introduced rewards issues. It has been deployed to production for a long time and stood the test of time (except a minor tweak regarding epoch 285's catalyst rewards).

Yes, they may have rolled back to 9.0.0 thinking it was correct, but I am 100% sure it was not and has never been correct.

I do not think that is the case. I mean, at least it was much more "correct" (like a fraction of a percent might have been incorrect). Otherwise we would see a lot more failed reward withdrawal transactions in Yoroi, as we would construct the transaction with incorrect reward in place and get rejected by the ledger. With 9.0.0, I really hardly recall any of it.

And as far as I know this has been the case with Adalite too, as they have upgraded from 9.0.0.

xdzurman commented 3 years ago

Same here. Users of adalite were always able to withdraw their funds while we were using 9.0.0 (we compute the reward amounts as a sum of all types of rewards minus withdrawals). This always worked, except the ones that retired their pools and got the deposit back, for which I created an issue more that 9 months ago. This issue (catalyst's epoch 285 double rewards) was the second major issue, but apart from that version 9.0.0 was correct (based on the Adalite's historical user logs and reported issues).

However, when switching to db synx 10.0.0, a huge portion of users had wrong rewards. These issues are still unresolved in 11.0.0, which forces us to show warning banners to our users that their reward history is most probably wrong. We also had to resort to ignore the reward sum from db sync and query it via ogmios (bloating our backends just for this one piece of information) so that the users can at least withdraw their funds. Ever since having to move from db sync 9.0.0, we are impatiently pleading for fixes to the major issues, so that we don't have to compromise the truthfulness of the data we serve to our users.

kderme commented 2 years ago

Hopefully this is fixed on https://github.com/input-output-hk/cardano-db-sync/releases/tag/13.0.0. Feel free to reopen if the issue persists. Most of the work for rewards is not done in leger through ledger events.