Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.42k stars 115 forks source link

Nara upgrade checklist #5064

Closed kdembler closed 5 months ago

kdembler commented 7 months ago

This document describes plan and a checklist of what needs to happen to properly execute Nara upgrade.

Before submitting proposal

First approval term

Second approval term

After upgrade

┆Issue is synchronized with this Asana task by Unito

mochet commented 7 months ago

Just one small thing to note is that any proposals in-flight will be cancelled at the time of the runtime upgrade.

freakstatic commented 7 months ago

@kdembler the CRT pallet will be unfrozen by default

looks pretty complete, good work 👍 should we add some upload tests on gleev and CRT features (like buy/sell tokens), after disabling maintenance mode, to make sure it's all working as expected?

freakstatic commented 7 months ago

Just one small thing to note is that any proposals in-flight will be cancelled at the time of the runtime upgrade.

For the runtime upgrade proposal it seems it goes to a "dormant" state after it get's the council approval

image
mnaamani commented 7 months ago

When we reference the "commit" for the runtime .. given that there could be other non-runtime related changes possibly, let use be more clear that it is the "runtime code shasum" produce by this script: https://github.com/Joystream/joystream/blob/nara/scripts/runtime-code-shasum.sh

mochet commented 7 months ago

Just one small thing to note is that any proposals in-flight will be cancelled at the time of the runtime upgrade.

For the runtime upgrade proposal it seems it goes to a "dormant" state after it get's the council approval image

yes, that means it is waiting for the 2nd round of council voting, then it will go into grace period and after the grace period when it actually executes all other active proposals will be cancelled.

kdembler commented 7 months ago

@freakstatic Thanks, removed point about unfreezing @mnaamani Thanks, updated

mnaamani commented 7 months ago

Two jsgenesis operated nodes that I will probably also need to ensure are working are the "status" server and the "faucet" server.

kdembler commented 7 months ago

Updates after testnet upgrade:

  1. We've discovered that the initial approach with timing the last approval in the 2nd round will not work, because the proposal in 2nd round also have an expiry block. Placing the last vote late enough for it to execute during revealing stage is not possible. Instead, we will use a "trigger block" functionality that lets us set exact execution block at proposal creation time. We have upgraded our testnet using this approach.
  2. We have tested different versions of our software/nodes during an upgrade:
    1. Both Ephesus and Nara validator nodes were able to produce blocks before and after the upgrade.
    2. Both Ephesus and Nara QNs have continued to work after the upgrade. The Ephesus indexer experienced a crash as expected, but recovered and is processing blocks.
    3. ⚠️ The Ephesus QN processor has crashed once we have submitted the freeze pallet proposal as it's not recognized. This means that we should update all critical QN instances to Nara versions before the upgrade executes.
    4. Both Ephesus and Nara versions of Orion have survived the upgrade but the Ephesus Orion processor has crashed once we created a new channel.
    5. Seems both Ephesus and Nara faucets continue working fine after the upgrade. I think they both crashed during the upgrade but worked fine after restart.
    6. Storage Squid, Colossus and Argus continue to operate normally.
    7. Status server seems to have some small issues, @DzhideX will prepare an updated Nara version.
  3. Bedeho has mentioned that his CRT pallet review efforts are not going as planned and that he's giving his green light and we should not wait for him with the upgrade.

Will upgrade the checklist to reflect those changes

kdembler commented 7 months ago

Update on wallet compatibilities:

  1. OneKey - it doesn't have Joystream metadata at all, so it works the same regardless if it's Ephesus or Nara - all extrinsics except the balance transfer only support blind signing.
  2. Polkadot Vault - will need manual metadata update via QR code.
  3. Browser wallets - work fine, need metadata update to get extrinsic decoding. Also show balance from Nara RPC.
freakstatic commented 7 months ago

Updates after testnet upgrade:

  1. We've discovered that the initial approach with timing the last approval in the 2nd round will not work, because the proposal in 2nd round also have an expiry block. Placing the last vote late enough for it to execute during revealing stage is not possible. Instead, we will use a "trigger block" functionality that lets us set exact execution block at proposal creation time. We have upgraded our testnet using this approach.
  2. We have tested different versions of our software/nodes during an upgrade:

    1. Both Ephesus and Nara validator nodes were able to produce blocks before and after the upgrade.
    2. Both Ephesus and Nara QNs have continued to work after the upgrade. The Ephesus indexer experienced a crash as expected, but recovered and is processing blocks.
    3. ⚠️ The Ephesus QN processor has crashed once we have submitted the freeze pallet proposal as it's not recognized. This means that we should update all critical QN instances to Nara versions before the upgrade executes.
    4. Both Ephesus and Nara versions of Orion have survived the upgrade but the Ephesus Orion processor has crashed once we created a new channel.
    5. Seems both Ephesus and Nara faucets continue working fine after the upgrade. I think they both crashed during the upgrade but worked fine after restart.
    6. Storage Squid, Colossus and Argus continue to operate normally.
    7. Status server seems to have some small issues, @DzhideX will prepare an updated Nara version.
  3. Bedeho has mentioned that his CRT pallet review efforts are not going as planned and that he's giving his green light and we should not wait for him with the upgrade.

Will upgrade the checklist to reflect those changes

Great work compiling this 👍 As you mention, both faucet's crashed with closed connection but restart them fixed the problem and they kept working fine.

Some conclusion: