Async Route feature with Callback and Server-Sent Events (SSE) support

mdaneri commented 3 months ago

This commit introduces the Async Route feature for Pode, including Callback and Server-Sent Events (SSE) support for improved asynchronous communication.

Benefits:

Improved Responsiveness: Async Routes enable your Pode application to handle multiple requests concurrently, reducing response times and improving overall system responsiveness.
Scalability: By creating independent runspace pools, you can efficiently manage resources and scale your application to handle increased loads or complex tasks.
Enhanced Security: With Pode's security features integrated with Async Routes, you can ensure that only authorized users have access to sensitive information and operations.
Flexible Task Management: Async Routes provide a unified interface for managing asynchronous tasks, allowing you to easily create, stop, query, or callback on running tasks.

New Features:

Async Routes: Implemented the Async Route feature to handle asynchronous operations more efficiently.
New Public Functions: Introduced new public functions in the Async.ps1 file to support the Async Route feature. These functions facilitate the creation, management, and execution of asynchronous tasks within Pode, providing developers with more flexibility and control over async operations.

New Functions:

Add-PodeAsyncGetRoute: Creates a route to get the status and details of an asynchronous task, supporting different methods for task ID retrieval (Cookie, Header, Path, Query) and various response types (JSON, XML, YAML). Integrates with OpenAPI documentation.
Add-PodeAsyncStopRoute: Adds a route to stop an asynchronous task, supporting task ID retrieval methods and response types. Integrates with OpenAPI documentation.
Add-PodeAsyncQueryRoute: Adds a route for querying task information based on specified parameters. Supports multiple content types for both requests and responses and can generate OpenAPI documentation.
Set-PodeAsyncRoute: Defines routes in Pode for asynchronous execution with runspace management, supporting response types (JSON, XML, YAML), callback functionality, and SSE. Each route creates an independent runspace pool, configurable with a minimum and maximum number of simultaneous runspaces.
Get-PodeQueryAsyncRouteOperation: Acts as a public interface for searching asynchronous Pode route operations based on specified query conditions.
Get-PodeAsyncRouteOperation: Fetches details of an asynchronous Pode route operation by its ID.
Stop-PodeAsyncRouteOperation: Aborts a specific asynchronous Pode route operation by its ID, setting its state to 'Aborted' and disposing of the associated runspace.
Test-PodeAsyncRouteOperation: Checks if a specific asynchronous Pode route operation exists by its ID, returning a boolean value.

Features

Independent Runspace Pools:

Configurable Runspaces: Each async route creates an independent runspace pool that is configurable with a minimum and maximum number of simultaneous runspaces, allowing for efficient resource management and scalability.

Security:

Pode Security: All async route operations are subject to Pode security, ensuring that any task operation complies with defined authentication and authorization rules.

Callback Support:

Callback Functionality: Supports including callback functionality for routes, with options for URL, content type, HTTP method, and header fields. Callback support is integrated with OpenAPI definitions to provide detailed route information and response schemas.

SSE Support:

SSE Support: Added Server-Sent Events support for real-time updates and seamless async communication.

Tests:

Created a complex Pester test with merged authentication to ensure the robustness of the Async Route implementation. This test highlights potential challenges with PowerShell 5.x compatibility.
Added a stress test script, Async-Computing.ps1, which evaluates the performance with 100 concurrent runspaces and over 250 parallel requests, ensuring the feature can handle high load scenarios.

Example Usage:

Add-PodeAsyncGetRoute:

Add-PodeAsyncGetRoute -Path '/task' -ResponseContentType  'application/json', 'application/yaml'  -In Path -Authentication 'MergedAuth' -Access 'MergedAccess' -Group 'Software' -TaskIdName 'myTaskId'

Add-PodeAsyncStopRoute:

Add-PodeAsyncStopRoute -Path '/task' -ResponseContentType 'application/json', 'application/yaml' -In Query -Authentication 'MergedAuth' -Access 'MergedAccess' -Group 'Software' -TaskIdName 'myTaskId''pippopppoId'

Add-PodeAsyncQueryRoute:

 Add-PodeAsyncQueryRoute -Path '/task' -ResponseContentType 'application/json', 'application/yaml' -In Query -Authentication 'MergedAuth' -Access 'MergedAccess' -Group 'Software' -TaskIdName 'myTaskId'

Set-PodeAsyncRoute:

Add-PodeRoute -PassThru -Method Put -Path '/auth/asyncUsing' -Authentication 'MergedAuth' -Access 'MergedAccess' -Group 'Software'   -ScriptBlock {
        return @{ InnerValue = 'something' }
    } | Set-PodeAsyncRoute -ResponseContentType 'application/json', 'application/yaml' -Callback -PassThru -CallbackSendResult -Timeout 300 | Set-PodeOARequest  -RequestBody (
        New-PodeOARequestBody -Content @{'application/json' = (New-PodeOAStringProperty -Name 'callbackUrl' -Format Uri -Object -Example 'http://localhost:8080/receive/callback') }
    )

Other functions:

Get-PodeQueryAsyncRouteOperation:
Get-PodeAsyncRouteOperation:
Stop-PodeAsyncRouteOperation:
Test-PodeAsyncRouteOperation:

Are the internal functions equivalent to route operations. The only difference is that there is no security involved. The main purpose of these functions are manipulate the internal state of the async routes.

Badgerati commented 3 months ago

Hey! I managed to get time to review 😄

I would recommend raising an Issue first for larger feature work, so it can be discussed before diving into the solution 😜

If I'm right, this is a wrapper for Routes, which sets the Route logic as an async task with optional routes for info retrieval?

If so, I'd recommend the following:

Re-use the existing Task functionality, adding any needed improvements there, rather than having 2 identical Task systems - plus they'll benefit from further improvements to Tasks in the future. If people need more threads, then Set-PodeTaskConcurrency is available to increase the amount used.
- There's a ticket I have to improve Tasks, so if this needs to be done lemme know as I can look at this first instead of the locale work for Pode.Web.
Use a single Runspace Pool rather than multiple, even for the re-using of Tasks above, otherwise it will get resource heavy quickly.
In the Async.ps1 file keep the function naming consistent like with other files, for example:
- Add-PodeQueryTaskRoute > Add-PodeAsyncQueryRoute
- Add-PodeStopTaskRoute > Add-PodeAsyncStopRoute
- Add-PodeGetTaskRoute > Add-PodeAsyncGetRoute
- Set-PodeRouteAsync > Set-PodeAsyncRoute
Have a separate Remove-PodeAsyncRoute which calls Remove-PodeRoute, but also Remove-PodeTask - moving any runspace clean-up into here.
For Set-PodeAsyncRoute it might be an idea to allow a "WebhookUrl" to be supplied, so when the Task the Route invokes completes it could send a callout to a webhook.
For Get-PodeUserRequest, there's already Get-PodeHeader, so I'd just split these up as Get-PodeData, Get-PodeQuery, and Get-PodeParameter each with a -Name parameter. Placing them into the public Utilities.ps1 file.

mdaneri commented 3 months ago

Regarding the Get-PodeData, Get-PodeQuery, etc., I agree with your suggestion. I'll create a PR for this enhancement.

For the function names, I agree that your suggested names are better. I hadn't spent much time on naming, so this feedback is helpful.

However, I have a question about the need for a separate Remove-PodeAsyncRoute. Remove-PodeRoute seems to cover the functionality since you cannot remove an async route without removing the route itself.

On the topic of merging the functionality with PodeTask, I have some reservations. PodeTask serves a very specific purpose that doesn't align perfectly with async REST calls. The latest commit includes an option to specify the maximum number of threads that each route can execute, which is a crucial feature. Some routes cannot be run concurrently or must have a limited number of concurrent executions due to the heaviness of the process.

mdaneri commented 3 months ago

Regarding the webhook. In this context, a callback is the appropriate outbound method. I'm going to extend the callback already in place to allow a complete interpretation of the callback semantics https://swagger.io/docs/specification/callbacks/

mdaneri commented 3 months ago

Another part that still I need to implement is the security. So far anyone can see everything. I need to us the roles and groups to limit access to the async results

Badgerati commented 3 months ago

specify the maximum number of threads that each route can execute, which is a crucial feature. Some routes cannot be run concurrently or must have a limited number of concurrent executions due to the heaviness of the process

I feel this is one that could be also achieved with Tasks as well, on top of #1037. There's need there to specify that certain tasks should only be run sequentially, and here we have a need to limit the number of a Task running concurrently, even sequentially at times - the two could probably be solved with the same solution, enabling the requirement here and enhancing Tasks at the same time:

Tasks when created and invoked, by default, use a global runspace pool; they can run concurrently, and have no concurrency limit. This'll be the default/common format for Tasks, and for Async Routes invoking them as well.

Then 2 potential options:

We introduce a new -Isolated switch on Add-PodeTask (and Set-PodeAsyncRoute), this enables a ParameterSet with advanced functionality to control threading - in this case likely just -MaxThreads for now, and setting this to 1 forces sequential processing only. If MaxThreads isn't supplied then the internal default of $PodeContext.Threads.Tasks is used.
- Tasks with -Isolated create a separate Runspace Pool. Having this switch makes it safer so that people don't accidentally create a mass amount of Runspace Pools.

For example:

# global
Add-PodeRoute ... | Set-PodeAsyncRoute -ResponseContentType Json

# isolated and sequential
Add-PodeRoute ... | Set-PodeAsyncRoute -ResponseContentType Json -Isolated -MaxThreads 1

or,

This is one of thought about in the past. We have a Add-PodeRunspacePool public function which lets people specify a -Name and a -MaxThreads.
- On Add-PodeTask and Set-PodeAsyncRoute there's a new -RunspacePoolName. If this is supplied then the Tasks run on the specified pool, if not passed then they run on the global Task pool. (It'll likely need protection to stop people running Tasks on other internal pools, hah)
- This would also allow for an isolated pool for multiple select Tasks/Route Tasks to run on - rather than 1 to 1.

For example:

# global
Add-PodeRoute ... | Set-PodeAsyncRoute -ResponseContentType Json

# isolated and sequential
Add-PodeRunspacePool -Name 'CustomPool' -MaxThreads 1
Add-PodeRoute ... | Set-PodeAsyncRoute -ResponseContentType Json -RunspacePoolName 'CustomPool'

This way we don't have duplicated logic, and improve Tasks all around.

mdaneri commented 3 months ago

I like the idea of the isolated parameter. But to be honest, I don't see a problem with having 1000 runspaces. A runspace uses no resources other than a small quantity of memory.

In a Pode project, I'm not expecting to see 1000 async routes; if that's the case, I doubt that all of them are used simultaneously. In the end, the number of running threads is the only thing that matters

At the moment, the way it works is like this :

Add-PodeRoute ... | Set-PodeAsyncRoute -ResponseContentType Json  -MaxThreads 2

As for the idea of using the same code for Task and Async, I'm only partially convinced it's feasible without compromising the compatibility with the current API.

ConvertTo-PodeEnhancedScriptBlock does all the magic by injecting the user code inside the "async" envelope, which is completely different from how the PodeTasks are managed. The only similar thing is Start-PodeAsyncRoutesHousekeeper, but I want to find a way to remove it. I was thinking of using an individual scheduler to clean up each async process.

mdaneri commented 3 months ago

The callback implementation is completed. Now, it is missing only the security part and the -Isolated switch

mdaneri commented 3 months ago

I’m looking at how to integrate SSE. It’s a very useful feature when you using an async call from a browser

mdaneri commented 2 months ago

Documentation is done the only part missing is SSE documentation and some minor fixes to the OpenAPI definition

Badgerati commented 2 months ago

I'm back from holiday, so I'll begin reviewing this one and the Runspace one as soon as I can :)

mdaneri commented 2 months ago

Runspace is simple There are just 2 functions to make the debugging easier and a small document that explains that

Badgerati commented 2 months ago

Hey @mdaneri, I'm gradually getting through the review, just a slow one atm! Please try not to commit anything to the PR while I go through, as it'll confuse the ongoing review 😄 I'm hoping to finish the rest of the review this week.

While going through work the ContentType parameters reminded me of a feature I was toying with a few months back which might actually help out a lot here. I've been mapping the idea to the work here, and so far it seems like a good match; when I get chance I'll write it up, but in short it's an alternative to the Write-PodeXResponse functions and the way ContentTypes are figured out - and respecting the Accept header more, similar to how you have here.

mdaneri commented 2 months ago

I was thinking of making a small change, but I can postpone it to the next release. In this implementation, when you query for an async task, there is no limit to the number of objects you can get back. I was thinking of adding a limit of 100 configurable.

Badgerati / Pode