flowerpowerdao / power-equalizer

GNU Affero General Public License v3.0
6 stars 12 forks source link

Upgrade resource exhaustion (general advice) #13

Open letmejustputthishere opened 1 year ago

letmejustputthishere commented 1 year ago

The Internet Computer has numorous hard resource limits, such as

This means a very popular application can start to fail in odd ways. Most dangerous is probably running out of the cycle limit in preupgrade, but it can also be bad if some public method that loops over possibly large data starts trapping.

There is no great remidy. The best to do is to take stress-testing and load-testing seriously: Install the app locally on a replica configured with the same resource limits as the real Internet Computer (I believe this wasn't always the default, but may have become the default now) and create artificial data of the size one optimistically expects, and test the canister's functionality, including upgrading it. Maybe grow the test size to see when limits are reached, to see how much headroom there is.

Then insert checks into the code that keep data structures well within the size that was tested to work well.

Especially when the canister is to hold media assets, this may become an issue.

letmejustputthishere commented 1 year ago

_saleTransactionState, transactionsState can grow indefinitely. test the canister with tons of sales to see if we can still upgrade it

ZenVoich commented 1 year ago

@letmejustputthishere I have added a script to test for upgradability for different GC and different amount of transaction sizes. But I cannot test it for large sizes because my Ubuntu only has ~1gb of free memory. Can you try this on your machine?

  1. Switch to test-upgrade branch
  2. Run npm run replica
  3. Run npm run test-upgrade
letmejustputthishere commented 1 year ago

that's so beautiful 😭 very well done

letmejustputthishere commented 1 year ago

we just have to make sure that getHeapSize, getMemorySize and grow never make it into production, make also through a flag in the Env? we could expose them in the the interface, but trap straight after the call if the flag isn't set to test or something

letmejustputthishere commented 1 year ago

do we have to reinstall the canister for each iteration of the loop? or can we just incrementally grow the state?

letmejustputthishere commented 1 year ago

wouldn't it also be interesting to see the heapSize as well? and why does our memory size before the upgrade grow as well 🤔 shouldn't this only be stable memory which is not affected until we actually upgrade the canister

letmejustputthishere commented 1 year ago
Screenshot 2023-02-03 at 10 30 09
ZenVoich commented 1 year ago

we just have to make sure that getHeapSize, getMemorySize and grow never make it into production, make also through a flag in the Env? we could expose them in the the interface, but trap straight after the call if the flag isn't set to test or something

We can keep this code in test-upgrade branch, and merge main into it when it's necessary.

ZenVoich commented 1 year ago

do we have to reinstall the canister for each iteration of the loop? or can we just incrementally grow the state?

We incrementally grow the state for each target size, then we try to upgrade. Before each target size we reinstall the code.

letmejustputthishere commented 1 year ago

We incrementally grow the state for each target size, then we try to upgrade. Before each target size we reinstall the code.

but why do we need to reinstall? can't we just keep the state and continue growing?

ZenVoich commented 1 year ago

We incrementally grow the state for each target size, then we try to upgrade. Before each target size we reinstall the code.

but why do we need to reinstall? can't we just keep the state and continue growing?

Yes, we can. I just chose the easy way by reinstall)

letmejustputthishere commented 1 year ago

if it doesn't make a difference i'd rather keep the state and just continue growing it instead of reinstalling. once you get to millions of transactions it takes quite a while until the memory has been grown :D

ZenVoich commented 1 year ago

if it doesn't make a difference i'd rather keep the state and just continue growing it instead of reinstalling. once you get to millions of transactions it takes quite a while until the memory has been grown :D

Ok) Will do it

letmejustputthishere commented 1 year ago

it seems like the getMemorySize call failed as well after the upgrade. this caused the script to actually exit and not continue the loop for the other types of garbage collectors. would be nice if the script gracefully fails when the upgrade fails and just continues with the next garbage collector :)

Upgrading...
Error: Failed update call.
Caused by: Failed update call.
  The Replica returned an error: code 5, message: "Canister r7inp-6aaaa-aaaaa-aaabq-cai exceeded the instruction limit for single message execution."
node:child_process:891
    throw err;
    ^

Error: Command failed: dfx canister call force-gc-copying-gc getMemorySize
Error: Failed update call.
Caused by: Failed update call.
  The Replica returned an error: code 5, message: "Canister r7inp-6aaaa-aaaaa-aaabq-cai exceeded the instruction limit for single message execution."
ZenVoich commented 1 year ago

wouldn't it also be interesting to see the heapSize as well?

I thought heap size is included into memory size. If it is not, then I can relate why memory size grows dramatically after the upgrade

letmejustputthishere commented 1 year ago

should we try this GC as well? https://github.com/dfinity/motoko/pull/3495

letmejustputthishere commented 1 year ago
Screenshot 2023-02-03 at 11 17 15
letmejustputthishere commented 1 year ago
Screenshot 2023-02-03 at 15 33 03

nice !

ZenVoich commented 1 year ago

should we try this GC as well? dfinity/motoko#3495

Added

it seems like the getMemorySize call failed as well after the upgrade. this caused the script to actually exit and not continue the loop for the other types of garbage collectors. would be nice if the script gracefully fails when the upgrade fails and just continues with the next garbage collector :)

Wrapped in try/catch

do we have to reinstall the canister for each iteration of the loop? or can we just incrementally grow the state?

added setting reinstallEach (set false for incremental grow)

letmejustputthishere commented 1 year ago

fyi @ZenVoich https://forum.dfinity.org/t/prim-rts-heap-size-and-prim-rts-memory-size/18429/4