FluffyLabs / pvm-debugger

1 stars 1 forks source link

Supporting external PVMs #81

Closed tomusdrw closed 3 weeks ago

tomusdrw commented 1 month ago

We would love to support other PVM implementations than typeberry.

If you'd like to be listed in the select box on the PVM disassembler page, please add a comment in this issue.

The idea is to compile the PVM to WASM (if possible) and expose a common interface that is yet to be fully defined (we are open for discussion).

The proposed interface for now is:

interface Pvm {
  /**
   * Re-initialize the PVM with given PVM program for JAM in Standard Program Initialisation container format.
   * 
   * This function is optional. The support is indicated in the metadata.
   */
  resetJAM(program: Uint8Array, gas: i64): void;
  /**
   * Re-initialize the PVM with given generic PVM program.
   *
   * This function is optional. The support is indicated in the metadata.
   * 
   * Note: memory initialisation is deliberately missing for now. A good format for this is TBD.
   */
  resetGeneric(program: Uint8Array, registers: Uint8Array, gas: i64): void;
  /**
   * Re-initialize the PVM with given PVM program in PolkaVM container.
   * 
   * This function is optional. The support is indicated in the metadata.
   */
  resetPolkaVM(program: Uint8Array, gas: i64): void;
  /**
   * Returns the current program counter of PVM
   */
  getProgramCounter(): u32;
  /**
   * Get the current status of PVM
   */
  getStatus(): u8;
  /**
   * Return gas left.
   */
  getGasLeft(): i64;
  /**
   * Return registers dump.
   * 
   * We expect 13 values, 4 bytes each, representing the state of all registers as a single byte array.
   */
  getRegisters(): Uint8Array;
  /**
   * Perform a single step of PVM execution.
   *
   * Returns false when the machine cannot make any more progress (i.e. it halted, panicked, or went out of gas). 
   */
  nextStep(): boolean;
  /**
   * Returns a fixed-length page of memory.
   * 
   * it's up to the implementation to decide if this is going to return just a single memory cell,
   * or a page of some specific size.
   * The page sizes should always be the same though (i.e. the UI will assume that if page 0 has size `N`
   * every other page has the same size).
   */
  getPageDump(pageIndex: number): Uint8Array;
}
type Metadata = {
  name: string;
  version: string;
  capabilities: {
     resetJAM: boolean;
     resetPolkaVM: boolean;
     resetGeneric: boolean;
  },
  wasmBlobUrl: string
}

I imagine that the teams will provide an URL for the JSON file with metadata. That file will be fetched by the UI at start to decouple deployment process of PVM implementations and the UI.

tomusdrw commented 1 month ago

Example Rust version of this API can be found here: https://github.com/FluffyLabs/pvm-shell/blob/main/src/lib.rs#L54

I think we might consider changing it to something like:

const pointer = newPvm(program, registers, gas);
const gasLeft = getGasLeft(pointer);

We should also consider returning pointers to WASM memory for getRegisters and especially getPageDump to avoid passing too much data between WASM and the browser.

koute commented 1 month ago

This is pretty cool!

I would be interested in officially adding my PolkaVM to it.

API-wise, maybe you could also take a look at my API for inspiration.

Few notes:

tomusdrw commented 1 month ago

Hey @koute! Thanks for the write-up.

It would be great to support PolkaVM and all of the other possible use cases you've mentioned. Let me address some things specifically.

  1. Your API looks definitely like something we would be striving for long-term. Initially my goal was to make the API as minimal as possible to make it easier for external teams to get integrated. I think we can even consider multiple levels of integration, where some PVMs would support just the basic API and some others a more complex one (I think we will end up with some metadata file about a particular PVM).
  2. Thank you for pointing out the different possible PVM blobs. I think it would be great to support all of the in the UI, however I agree that we don't necessarily need all PVM intepreters to support all of them (again something that could land in metadata).
  3. Supporting debug info would be amazing, I'd say that if the tool proves to be useful we would be happy to take a shot at implementing that support. We lack a bit of knowledge and experience on this one though, so any pointers would be great.
  4. The vision of being able to debug Solidity code right from your browser is staggering. For this to be actually useful I think we would need to be able to emulate different execution environments coming with their own set of host functions (and for instance having an in-browser storage that can be populated from on-chain data).

I've updated the interface code to encompass the different blob kinds. The API currently is well suited for wasm_bindgen-like output, I think it is going to become a bit more raw (i.e. passing pointers instead of Uint8Array) to make it easier for other implementations. I'm planning to get in touch with teams writing the JAM PVM in Go to figure out what output they can provide.

koute commented 1 month ago

3. Supporting debug info would be amazing, I'd say that if the tool proves to be useful we would be happy to take a shot at implementing that support. We lack a bit of knowledge and experience on this one though, so any pointers would be great.

In general the easiest thing here would most likely be to piggyback on PolkaVM's crates compiled to WASM (at very least until I can get the PolkaVM program blob format somewhat standardized like it is for WASM).

The main two types of interest are ProgramBlob and ProgramParts. These are, essentially, mostly equivalent, except a ProgramParts is just a ProgramBlob split into parts.

So, the bare minimum to do to be able to load PolkaVM blobs would be something like this:

#[wasm_bindgen]
pub fn polkavm_to_code_blob(raw_blob: Vec<u8>) -> Vec<u8> {
    let parts = polkavm::ProgramParts::from_bytes(&raw_blob).unwrap();
    return parts.code_and_jump_table.to_vec();
}

This will give you a raw PVM code blob which you can already ingest.

Now, to get debug info working you'd have to use ProgramBlob::parse or ProgramBlob::from_parts to create a ProgramBlob, keep the ProgramBlob around, and then you can use get_debug_line_program. You give the function a program counter/byte offset into the code, and it will return you an iterator which produces FrameInfo structs, which in turn tell you the function name and/or the source path/line of where the given piece of code comes from. (So if you'd display the source code side-by-side you can use this to make a source-level debugger.)

Currently the debug info support is limited to being able to extract the locations of the code in the original sources, but I'm also planning to add support for getting backtraces and also for reading/writing to local variables, etc. (Basically I want to support full blown rich debugging experience.)

tomusdrw commented 3 weeks ago

It's now possible to load wasm-bindgen compatible WASM blob (either via URL pointing to the metadata JSON or via direct upload of WASM file) #94.

We've also added PolkaVM to the dropdown list as one of the default choices: #99. PolkaVM is compiled from https://github.com/tomusdrw/polkavm/blob/master/pvm-shell/src/lib.rs The original, koute's polkaVM, is a submodule in that repo and we just plug it into the pvm-shell API.

I've extracted support for debug symbols to a separate issue #100