demergent-labs / kybra

Python CDK for the Internet Computer
MIT License
81 stars 15 forks source link

Retrieve errors from init/post_upgrade #417

Closed lastmjs closed 7 months ago

lastmjs commented 1 year ago

Because of our new chunk-uploading post_install installation process, we no longer receive any indication if there were errors in init or post_upgrade of our Kybra canisters. The cross-canister call that the kybra_deployer canister uses to install_code with the actual application Wasm is a notify, and not a full request/response. This is because the Wasm binary is swapped out, and the response attempts to call a basically random function on the new Wasm. This caused undefined behavior and very scary intermittent test failures. Thus, we use notify.

There are two main drawbacks to this currently.

  1. We receive no indication of errors in the init/post_upgrade process
  2. Our post_install process does not finish when the canister is fully initialized, having gone through the entire init/post_upgrade function invocation. Instead it will return early, thus we must introduce some kind of delay into the post_install process to account for this. Right now we just use five seconds which works for all of our tests

There are a few possible solutions to this problem:

  1. Wait for dfx to implement chunked Wasm binary uploading, so that we can get rid of our post_install process entirely (at least ignoring the need for the python stdlib). Uploading the python stdlib may still be an issue, as we may not want to recompile it into the main binary every time.
  2. Try some hacks such as seeing if we can figure out which index a function has in Rust on each compile. If we put that index into the correct spot, we may be able to put a function at the correct index in the app canister, and have the install_code call hit that function on response
  3. Wait for some other solution from DFINITY so that we can set another callback function in the response
lastmjs commented 1 year ago

We will wait for default or named cross-canister call callbacks at the protocol level to get a proper solution to this. For now, locally the errors should be displayed, just not in production.

lastmjs commented 7 months ago

dfx now supports larger Wasm binaries, and this issue is resolved