Kuadrant / wasm-shim

A Proxy-Wasm module allowing communication to Authorino and Limitador.
Apache License 2.0
5 stars 5 forks source link

refactor operation dispatcher #127

Open eguzki opened 3 weeks ago

eguzki commented 3 weeks ago

The current implementation works (famous last words). However, it does not fit well with the lifetime of the tasks, which may (or may not) run (one or multiple) GRPC requests.

Description of the workflow

The high level structure

The GRPC operation


pub enum OperationResult {  
    Done,  
    Wait,    
    Failed,   
}                  

trait GRPCOperation {
       // Will start the operation task 
       // The return can be either Done, Wait or Failed
       fn start(&self) -> OperationResult;

       // Will resume operation task
       // The return can be either Done, Wait or Failed
       fn on_grpc_response(&mut self, token_id: u32, status_code: u32, resp_size: usize) -> OperationResult;
}

The GRPC operations are started and then they can run N GRPC requests. Each time a GRPC request is done, the operation will be waiting for response that will be signaled with the on_grpc_response.

The operation result has the following meaning:

When something failed on the operation work, the operation should check FailureMode. If failure mode is deny, then the operation should send HTTP response self.send_http_response(status_code: 500, headers: vec![], body: Some(b"Internal Server Error.\n")) or whatever it needs to be send. For instance, on over the limit in a rate limiting action, it should send hostcalls::send_http_response(status_code: 429, response_headers, body: Some(b"Too Many Requests\n")). Then return OperationResult::Failed

On the other hand, if failure mode is allow, the operation should decide what to do. It can either be OperationResult::Failed or OperationResult::Done, but never OperationResult::Wait

The operation dispatcher


pub enum RunAction {  
    DoneOK,  
    DoneFailed,  
    Wait,
}                  

trait OperationDispatcher {
       // Will run the list of operations
       // On RunAction::DoneOK, the operation dispatcher is signaling it is done and does not expect to be called again in `run`
       // On RunAction::DoneFailed, the operation dispatcher is signaling it is done and the workflow failed and does not expect to be called again in `run`
       // On RunAction::Wait, the operation dispatcher is signaling it is NOT done and does expect to be called again in `run`
       fn run(&mut self) -> RunAction;

       // Will resume the work of one of the GRPC operations
       // It does not return anything. The dispatcher expects to be called on the "run" method again.
       fn on_grpc_response(&mut self, token_id: u32, status_code: u32, resp_size: usize);
}

From the Wasm HTTPContext, on http_request_headers, the operation dispatcher should be populated with the action sets and those which conditions do not apply, then filtered out. Then, the run method should be called and map RunAction to proxy_wasm::types::Action. The mapping should be straightforward

From the Wasm HTTPContext, on on_grpc_call_response, the on_grpc_response method should be called. That should update the internal status of the current operation waiting for GRPC response. Then, resume dispatcher work running again run method. From on_grpc_call_response, it is needed to know whether the dispatcher is done and then call self.resume_http_request() method. This is what RunAction will tell.

When run method it called, internally, the operations will be executed. When one of the operations returns "Fail" state, from the dispatcher perspective, the entire workflow should be halted and the return value should be "RunAction::DoneFailed. When ALL the operations are done and the queue is empty, the return value should be "RunAction::DoneOK. When one of the operations return OperationResult::Wait, the dispatcher should return RunAction::Wait.

When on_grpc_response is called, the dispatcher knows which operation is waiting (the first of the list) and can call its on_grpc_response method. Then. wait run method to be called.

Operation dispatcher state machine

graph TD;
    INIT-->WAIT;
    WAIT-->WAIT;
    INIT-->OK;
    WAIT-->OK;
    INIT-->ERROR;
    WAIT-->ERROR;
eguzki commented 3 weeks ago

Not for today. I would love your thoughts @didierofrivia

didierofrivia commented 3 weeks ago

I think at the moment there's no reason for considering a refactor... specially after https://github.com/Kuadrant/wasm-shim/pull/125 that is solving the use cases you presented with the e2e tests. However, it will be definitely good to review not only the OperationDispatcher but the whole of the wasm shim.... starting from it's name. We might need to present real use cases and/or performance/complexity tests in order to not be biased by our assumptions on hypothetical use cases and workflows. However, these kind of analyses are good and we could revisit it after GA in order to assess its viability 👍🏼

eguzki commented 1 day ago

Should fix https://github.com/Kuadrant/wasm-shim/issues/149