Closed JanisErdmanis closed 7 months ago
Hi @JanisErdmanis,
Thanks for the detailed write up, you bring up a lot of good points. Since needing access to the same global state is a hard requirement, it does limit the options a bit.
For now I'd recommend going with option 1, where you use the router
function to split up your app into two main sections (internal and public). This will allow your app to access the same global memory while properly segmenting the api to protect it form unauthenticated users.
Routers help keep your code DRY, by having a single place to configure whole sections of your api. The router is just a function, so it can be exported and shared across multiple files.
It's worth mentioning that you can't nest routers within routers at the moment. If you need more fine grained control I'd recommend just defining another router function with a mostly similar prefix
using Oxygen
using HTTP
function AdminMidleware(handler::Function)
function(req::HTTP.Request)
# do some authentication here
handler(req)
end
end
# all "admin" routes will be prefixed with "/internal" and have the AdminMidleware applied to each request
admin = router("/internal", tags=["internal"], middleware=[AdminMidleware])
"""
Define the internal routes
"""
@get admin("/") function()
text("welcome to the online version of the app")
end
@get admin("/custom") function()
text("Here's a custom admin endpoint")
end
"""
Define the public facing routes
"""
@get "/public" function()
text("welcom to the public version of the app")
end
serve()
You're absolutely right, this package leans super heavily on the internal states for the simple api. The global state has to be stored somewhere, but I like the idea of having a macro so something to allow for multiple instances of Oxygen. Granted, that's going to take a lot of work to rework things to be more composable.
Moving forward:
router
nestingI have some other ideas in the road map that I believe will benefit from these changes
The AdminMiddleware looks interesting. I could block requests for the public API going in there. However, in my situation, the admin API also serves the webpage and thus, putting internal
GET
in front of every request is visible in the address bar. It could be solved by modifying request targets in the middleware, which is something I have yet to consider.
For now, I have found that:
include(Base.find_package("Oxygen"))
using .Oxygen
works reasonably well. However, I don’t want to keep Oxygen dependencies in my Project.toml
and for the past few hours, I have been trying to find a workaround to switch the project load to the Oxygen dependencies and switch back, but I haven’t figured out that yet.
I suggest that we tackle the task of moving out global states incrementally. My proposition is to first move all global states to the Oxygen.jl
scope and add corresponding methods that interact with that state there. We can start by moving the ROUTER
outside with corresponding macro definitions. Internally, we should parameterise as many methods as possible with the router.
Then we could move on to the autodoc.jl
module, focusing on mountedfolders
, taggedroutes
, custommiddleware
. Other variables look like defaults and could be later added as keyword arguments for the serve method. Perhaps repeat tasks
and cronjobs
as well as timers
could already work without refactoring. Can you have a look at whether it could be so? This could significantly reduce the effort involved.
From metrics.jl
, only a history
object would need to be put in the global scope. Meanwhile, from streamutil.jl
, parametrisation with HANDLER
seems easy. So, in total, there seem to be the following variables that need to be taken into most outside layer:
ROUTER
from core.jl
server
from core.jl
mountedfolders
from autodoc.jl
taggedroutes
from autodoc.jl
custommiddleware
from autodoc.jl
history
from metrics.jl
HANDLER
from streamutil.jl
Do you have any thoughts on if this list is exhaustive?
I could do the implementation of parameterisations for each global variable individually followed with corresponding pull requests. Would you be open to accepting such contributions that involve parameterizing internal methods and relocating those globals to the topmost layer where the exported convenience methods also would be added? I believe that with a bit of guidance, I could successfully execute this plan within a week or at most two.
Yeah, I'd be open to accepting these PR with this caveat:
I'd like to keep the current API the same or easy to migrate. The simplicity is the biggest selling point, and I'm hesitant to completely rewrite the internals to make it more composable if it sacrifices ease of use.
Ideally, I'd like a hybrid approach where the internals are composable while the external API easy to use (don't have to worry about declaring routers).
Before either of us put any serious time into this rework. Let's work backward and share some examples of what the finished product would look like and function.
Could you share a simple example of what the API would look like for a basic server?
Could you share a simple example of what the API would look like for a basic server?
The exported API would not change after the changes, and only the internal API would be affected. The @oxidise ROUTER
would be entirely optional and only useful for situations like mine where one needs two different routers or would like a bit cleaner approach when defining the router in the module. The API would remain the same concerning exported methods. There are a few details on how to expose the internal state of the oxidised router, which we can discuss when we get to this point.
Implementing the @oxidise
macro can be made relatively simple. Let's say that the exported convenience methods of Oxygen could be defined in the methods.jl
file included at the topmost scope. The macro in the expanded form then would only do the following:
using Oxygen
using HTTP
router = HTTP.Router()
@oxidise router
### Macro expansion
const ROUTER = Ref(router)
const server = ...
const mountedfolders = ...
const taggedroutes = ...
const custommiddleware = ...
const history = ...
const HANDLER = ...
include(joinpath(dirname(Base.find_package("Oxygen")), "methods.jl"))
###
@get "/test" function(req::Request)
return "Hello World!"
end
serve(router, port=2341) # `serve(port=2341)` also still works, but would refer to the internal router then.
But I just realised it is a little more complicated than that. Here, the methods.jl
would require explicit referencing with an Oxygen
module in front for the right side. Thus, the macro would need to load the methods.jl
into AST and add Oxygen
to every method available within the topmost Oxygen scope. Nevertheless, this is totally doable.
An advantage of this approach is that it doesn't add complexity to the codebase. The only difference is that internal methods are parameterised, while the exported methods are defined in methods.jl
, which binds them to the globals at the highest level.
So I was doing a little work on this idea, and kinda stumbled into a solution. While we were talking about the API and what it would look like going forward I wanted to take another stab at emulating the FastAPI interface from python.
I had some trouble at first, but eventually was able to dynamically generate a module and dump all of the exported symbols from core.jl into the newly created module. Julia really tries hard to prevent people from doing this, but when done it ends up perfectly sandboxing the package.
Down below I have an example where I create two completely separate instances with their own unique globals, routers and functions. It uses the exact same default API that oxygen uses and integrating it was a breeze.
The best part is that this doesn't introduce any changes to the current API and all this functionality is exposed through a single function.
Here's my branch: feature/multi-instance
Here's the file I added to make this possible: https://github.com/OxygenFramework/Oxygen.jl/blob/feature/multi-instance/src/instances.jl
module MultiInstanceDemo
include("../src/Oxygen.jl")
using .Oxygen
using HTTP
# Setup the first app
app1 = oxidize()
app1.get("/") do
"welcome to server #1"
end
app1.@get "/subtract/{a}/{b}" function(req, a::Int, b::Int)
a - b
end
# Setup the second app
app2 = oxidize()
app2.get("/") do
"welcome to server #2"
end
app2.@get "/add/{a}/{b}" function(req, a::Int, b::Int)
a + b
end
# start both servers together
app1.serve(async=true)
app2.serve(port=8081)
# clean it up
app1.terminate()
app2.terminate()
end
And here's a video of me running it:
I will have a look at the implementation a bit later. I do, however, think that using app1
and app2
in front of methods and emulating OOP is not an idiomatic way to do it in Julia. Julia has an excellent namespace system, so I suggest using a following API instead, which accomplishes the same goal:
module MultiInstanceDemo
module APP1
# Setup the first app
using Oxygen
using HTTP
@oxidise
get("/") do
"welcome to server #1"
end
@get "/subtract/{a}/{b}" function(req, a::Int, b::Int)
a - b
end
end
module APP2
# Setup the second app
using Oxygen
using HTTP
@oxidise
get("/") do
"welcome to server #2"
end
@get "/add/{a}/{b}" function(req, a::Int, b::Int)
a + b
end
end
# start both servers together
APP1.serve(async=true)
APP2.serve(port=8081)
# clean it up
APP1.terminate()
APP2.terminate()
end
I looked into implementation and I find to be the same as doing the following:
module app1
include(Base.find_package("Oxygen"))
import .Oxygen: get, @get # and all other symbols
end
module app2
include(Base.find_package("Oxygen"))
import .Oxygen: get, @get
end
app1.get("/") do
"welcome to server #1"
end
app1.@get "/subtract/{a}/{b}" function(req, a::Int, b::Int)
a - b
end
# Setup the second app
app2 = oxidize()
app2.get("/") do
"welcome to server #2"
end
app2.@get "/add/{a}/{b}" function(req, a::Int, b::Int)
a + b
end
# start both servers together
app1.serve(async=true)
app2.serve(port=8081)
# clean it up
app1.terminate()
app2.terminate()
This approach suffers from the need to include all Oxygen dependencies in the app project. Isn't the same issue present in your approach?
That's close, but not the same. In this approach, we're including all the files at runtime and reinitializing everything (new internal globals for each module).
I've since made some changes so that the generated modules look similar like this:
module app1
include("Oxygen.jl") # the absolute path to this file is used
using .Oxygen
end
module app2
include("Oxygen.jl") # the absolute path to this file is used
using .Oxygen
end
To your last point, I believe this is the case. Each generated module has to be initialized to prevent them from using the same internals.
I did find a much cleaner way to generate the modules like the example below, but this has the same issue we're encountering now where the same internal globals are getting used between instances.
module app2
using Oxygen
end
Regardless, I don't see this getting in the way of your planned refactor. If anything they'll complement each other, and allow people to define servers either statically or dynamically since the external API won't change. I'd also change the name of this function so that it has a different one than the one you're planning to avoid confusion.
I'm excited about this approach because it allows people to dynamically create and stop instances (in the same parent module). I had some other ideas around performance and introducing the ability to easily spin up and manage multiple instances of oxygen, which now became much more plausible.
Being able to encapsulate them dynamically opens up a bunch of new possibilities.
Ok. I will start with parametrising the internal code with the ROUTER
and bringing it out at the topmost scope. This will likely determine if I will be able to accomplish the goal and will give a glimpse of how the code will look after the refactoring is done.
In retrospect, I have already thought that all globals could be systematically injected as the first argument in the following approach:
function some_internal_function(ctx::@NamedTupple{router::Router, server::Server}, arg1, arg2)
# Some internal code
end
which has the benefit of not polluting internal interfaces and also allows this refactor to be done incrementally.
I made good progress yesterday and managed to put the router at the top scope and run all tests. I forked and added a new branch, feature/oxidise
, available here. I had only an option to push the commit to master. Perhaps you could create a branch where I could push commits.
I had a lot of warnings about overwritten methods during tests. Thus, I fixed those by changing the way the module is imported to a more orthodox approach. However, in the process, I broke templatingtests.jl
. I noticed you were using Requires
; I thought it got deprecated in favour of the new Julia extension system. Perhaps you can comment on those things for context.
I prefer to use capital case for global variables, much like you have used for ROUTER
and HANDLER
; thus, I plan to capitalise all other global variables as well if you have no objections here. I also did not extend Core
functions but referred to them explicitly. This has the benefit that it is actually very easy to make the oxidise macro, which in the expanded form now looks like could be:
using Oxygen: Core
const ROUTER = ...
const server = ...
const mountedfolders = ...
const taggedroutes = ...
const custommiddleware = ...
const history = ...
const HANDLER = ...
include(joinpath(dirname(Base.find_package("Oxygen")), "methods.jl"))
Thanks for the update! below are my responses:
I have finished refactoring and implemented the @oxidise
method. It works as follows:
module Example
using Oxygen; @oxidise
@get "/inside" function(req::Request)
return "Hello from Inside"
end
end
and allows the use of the module in the following manner:
using Example
Example.serve(port=3424)
The @oxidise
macro is fully optional, and after the refactor, all tests pass. I have deprecated enabledocs
, disabledocs
and isdocsenabled
, which now show deprecation warnings pointing to the docs
keyword in the serve
function.
During the refactoring, I have improved tests with the following changes:
using Oxygen
where previously a local import was used;show_errors
keyword argument in serve
, which passes it down to DefaultSerializer
;internalrequest
with @suppress
macro at the call site when errors are expected;Apart from refactoring globals into a context object, I have fixed the duplicate include
statements for Util
. I also added 'HTTP.Request` to the export list for Oxygen.
The history object at the global scope could, in a practical sense, be created within the serve
method at runtime. I, however, did not made such a change as it is also passed to internalrequest
function. I would suggest that internalrequest
exported to the user by default work with an empty history object, and only if it is explicitly passed to internalrequest
would it be used instead. Would you find that change appropriate?
The remaining globals are those that are used at runtime. I have sketched that these could be created within a serve method with the following struct:
struct Runtime
run::Ref{Bool}
jobs::Set
timers::Vector{Timer}
history::CircularDeque{HTTPTransaction}}(CircularDeque{HTTPTransaction}(1_000_000)
streamhandler::Union{Nothing, StreamUtil.Handler}
end
which is then passed downstream. It could be returned with the serve
method for async=true
along the HTTP.Server in the following struct:
struct Service
context::Context
runtime::Runtime
server::HTTP.Server
end
To the service object, we could attach terminate
, stopcronjobs
, startcronjobs
, clearcronjobs
, resetcronstate
, stoptasks
, starttasks
so those would not need to be defined in methods.jl
.
Thanks for all your hard work, I've been checking out the individual commits you've been pumping out all weekend.
To your question around internalrequest()
function, I think what you described makes sense internally. As long as the publicly exported interface doesn't allow people to inject or modify the history object. The metrics
keyword should be an all-or-nothing kind of flag that determines if we maintain a history or not.
Breaking down the Context object into smaller structs is a great idea. Creating a monolith would make it more difficult to modify and grow later down the line. Techinically, all the values are used at runtime, but I see your point. Here's another structure we could think about.
struct Service end
struct Documenation end
struct CronManagement end
struct RepeatTasks end
struct Context
service::Service
docs::Documenation
cron::CronManagement
tasks::RepeatTasks
end
You don't have to follow these naming conventions exactly, but It'd be helpful to have high cohesion between related variables & functionality. When having big context objects it's important that we know exactly where the right variables are located so that there's no confusion ( it also makes it very readable )
I've also enabled running the ci/cd pipeline for this pr, so you should be able to adjust the code to improve the coverage. It's currently failing for julia version 1.6.6 but this compat entry was from when I first started the project. Feel free to adjust the target version to 1.6 to let it automatically select the latest TLS version.
I updated the pull request and tested that the code works with Julia 1.6. Code coverage has decreased only slightly, and it is only because I had to modify code that had not been covered before, where we can look into streamutil.jl
To your question around internalrequest() function, I think what you described makes sense internally. As long as the publicly exported interface doesn't allow people to inject or modify the history object. The metrics keyword should be an all-or-nothing kind of flag that determines if we maintain a history or not.
You may have misunderstood me. What I am proposing is the following change:
internalrequest(req::Oxygen.Request; middleware::Vector=[], metrics::Bool=true, serialize::Bool=true, catch_errors=true) = Oxygen.Core.internalrequest(CONTEXT[], HISTORY[], req; middleware, metrics, serialize, catch_errors)
To a modified public API as follows:
function internalrequest(req::Oxygen.Request; middleware::Vector=[], metrics::Bool=false, serialize::Bool=true, catch_errors=true, history::Union{Nothing, History}=nothing)
if metrics && isnothing(history)
error("History object needs to be specified")
else
history = History(0)
end
return Oxygen.Core.internalrequest(CONTEXT[], history, req; middleware, metrics, serialize, catch_errors)
end
Would you find such a change appropriate?
This is necessary as I would like to instantiate the history struct in the serve method at runtime. This is desirable because it reduces the number of global variables at the global scope while separating compile time constants (those listed in the Context) and those populated during a serve
method during runtime.
The distinction between Context
and Runtime
is important because it lets one quickly glance through the codebase to see whether a particular function is run at runtime or compile time. When a function accepts a runtime object, one treats the context object as immutable, leading to safer and more predictable code. Explicit immutability, where possible, is a great thing.
At the moment, I don’t see the Context
object as defined to be excessively large for outweighing the benefit of accessing a variable directly as ctx. docspath
vs ctx.docs.docspath
. It is pretty clear from naming of the variables what are their purpose in the codebase. Additional documentation can be provided for more clarity. When browsing the codebase, the developer can find all variables from ctx
by simply searching in the editor. KISS.
EDIT: If one introduces documentation, the resulting Context
struct be:
struct Context
router::Router
mountedfolders::Set{String}
custommiddleware::Dict{String, Tuple}
repeattasks::Vector
job_definitions::Set
cronjobs::Vector
docs::Documentation
end
which could make Context
simpler while keeping the separation of compile time constants and runtime variables. An alternative I think is feasible is to move docspath
and schemapath
as keyword arguments to serve. I imagine it could be possible to inject middleware that redirects the /docs
path to a separate router created within the serve method at runtime.
I've updated your branch with a couple of commits. Currently, I have some work in a local branch were I've migrated the cron module to use the new Context struct. The default module is working and tests are passing - but the @cron macro isn't defined when trying to use the new @oxidise macro.
I have this code in a separate local branch since these changes are breaking the new macro. Feel free to try migrating the cron stuff as well, or let me know if you want me to push directly to your branch.
Currently, I have some work in a local branch were I've migrated the cron module to use the new Context struct
Putting jobs
and run
into Context
object will require you to add startcronjobs
and stopcronjobs
at methods.jl
. Nevertheless, I urge you to consider the following organisation of variables:
struct Context
router::Router
docs::DocsContext
cron::CronContext
tasks::TasksContext
end
struct Runtime
cron::CronRuntime
tasks::TasksRuntime
history::History
streamhandler::Union{Nothing, StreamUtil.Handler}
end
struct Service
context::Context
runtime::Runtime
server::HTTP.Server
end
where the Context
is initialised at compile time and Runtime
is populated at runtime. The Service
is returned from a serve method:
serve(ctx::Context, kwargs...) -> Service
serve(ctx::Context, kwargs...) -> Service
Then, one can attach to the objects the following methods:
startcronjobs(::Union{CronContext, Context, Service}) -> CronRuntime
stopcronjobs(::Union{CronRuntime, Runtime, Service})
clearcronjobs((::Union{CronContext, Context, Service})
starttasks(::Union{TasksContext, Context, Service}) -> TasksRuntime
stoptasks(::Union{TasksRuntime, Runtime, Service})
terminate(::Union{CronRuntime, TasksRuntime, Runtime, Service})
resetstate(::Union{Runtime, CronRuntime, TasksRuntime, Service})
This clearly separates concerns and promotes immutability. It does not mean that the API needs to change in a breaking way; both options could be supported and eventually see if the Julia community prefers using existing API coupled to a global state or uses these API functions with return value from serve
.
I encountered a particular issue yesterday where the separation was important. When the serve
method is called repeatedly from a compiled module, resetting the compile-time constants should not be possible because otherwise, you end up with a clean state, and there is no use to that. This was easy to fix with the following change:
if (@__MODULE__) == Oxygen
CONTEXT[] = Oxygen.Core.Context()
end
If all variables were being put in Context
instead, we would have to go through every context field and reset them individually, knowing which are created at runtime and which are at compile time.
Ok great, it looks like we're on the same page. I didn't reorganize the variables into those categories based on an older comment. I really like how it keeps things tight and organized. I won't be able to work on those changes till later tonight, so you don't have to worry about any rogue commits impacting your progress.
Next Steps
Let me know if you can think of anything else missing from this list
Finish migrating the cron and tasks globals
I have an evening free. I will try to work on this.
Sounds good, let me know what's still pending (if you commit anything that's still a work in progress) and I can take a look at it later tonight to keep things moving.
I have found three additional things which would be great to do with this refactoring session:
docspath
and schemapath
as keyword arguments to serve;cronjobs
out by adding the job_definitions
during register
;The last two tasks are needed to make Context
object constant, which is necessary to avoid potential bugs when the server is restarted within the same Julia session.
I am going to sleep; you can work on the remaining tasks now.
So, I have come to the conclusion of the refactoring. The last changes have moved docspath
and schemapath
out to serve
as keyword arguments and now registers documentation to a separate router at runtime, which is coupled to the main router with a middleware. This change also prevents HTTP.register
from modifying already present routes, and thus, when restarting a serve
, no longer warnings by HTTP.serve!
are shown, allowing to refactor out Suppressor. This is great because now the user gets a warning if it has duplicate routes registered, preventing confusion.
The second change I made was refactoring out cronjobs
from the Context
and making registration of cron jobs at the route registration time. The result is that the Context
is now fully immutable during the runtime; thus, the service would work exactly the same when restarted. Both of these changes required splitting the register function to add direct methods that work with the router, where I introduced parse_route
and parse_func_params
.
Yesterday, I finished the migration of cron and tasks globals, which are stored at the service object created during runtime. I also cleaned up the global scope. Now there are only two globals, CONTEXT
and SERVICE
. From the latter one, the history object is taken when it is available, and this is how internalrequest
works. No tests were broken with this change. Also, I made it so that internal request by default has metrics=false
and docs=false
as now there are docs middleware instantiation involved, which in most instances can be skipped.
I also noticed a peculiar behaviour in the error handling in the middleware. When an error is thrown by the router, it is handled by the serialiser, but if it is thrown by a middleware, it is handled by middleware if metrics=true
. Is this intended?
I have no more changes planned. I added a simple test for @oxidise
macro. So, I see that the only task remaining is writing documentation for the macro. This can also mention the workflow of working with Revise, as I described in #112. A small announcement on discourse may also be suitable as the tool has become more versatile, and the Revise support is neat.
It would be cool if we could merge it on master, as incorporating merges with this refactor will be super painful. A release of a new version may be delayed for a little to see if there are any incurring bugs.
To conclude, I am very glad that this package had such detailed test cases. It was a pleasure to do this refactor and be able to find the bugs exactly when they occur!
Awesome, I'm looking forward to reviewing and merging these changes in.
Regarding this point:
I also noticed a peculiar behaviour in the error handling in the middleware. When an error is thrown by the router, it is handled by the serialiser, but if it is thrown by a middleware, it is handled by middleware if metrics=true. Is this intended?
Not really, at the moment any errors thrown inside a middleware function aren't caught on purpose. I think this is just a byproduct of the Metrics being at the "top" of the middleware chain and including some error handling. I'd like people to manually handle their errors for middleware functions since adding any default behavior can mess with certain flows.
I have a project that requires two distinct instances for Oxygen - one for public API and another for admin access, which must be served only on a localhost. I can't combine them because public API is by default a plain HTTP, which is not secured for serving the admin panel. Therefore, I need a second access point.
The problem is that Oxygen has a single internal global state, but I need two separate states for different ports. To illustrate the issue, let's consider a
ModuleA
that is defined as:The Oxygen in the runtime has only a single state, as shown in this example:
where
ROUTER
now has both inside and outside routes attached to it.I cannot run those two routes on separate processes as I need to access the same global state, which requires a lock for certain operations. The actor model is not suitable here as the concurrency is simple, but communication is complex.
I can think of two ways to tackle the issue. One option is to only serve routes with a matching tag. However, I haven't explored this approach in depth yet. The downside of this option is that all the routes would need to be tagged, which would make the code less legible.
Another approach I would much prefer is that the state could be initialised locally with a macro along with stateful operations. From the API standpoint, that could work as follows:
where the
@oxidise
macro defines a custom@put
,@post
,@get
,serve
etc. methods for theROUTER
along with the additional hidden state. This could work without issues as the methods would overshadow those exported by Oxygen. This would also address the issue #115 properly.I spent a few hours attempting to refactor Oxygen by limiting the usage of global state only to exported methods. The idea was to inject the global state into subsequent methods, making it easier to supersede them with a macro. Unfortunately, I got overwhelmed by the amount of work needed here, as the codebase contains a significant amount of global variables. The major challenge I faced was the
autodoc.jl
custommiddleware
variable, which is referenced by thehasmiddleware
method. I suspect that there are more couplings with theautodoc.jl
library like this one.I also noticed that you have
include(“util.jl”)
in multiple files. This is a little bad as the compiler needs to compile every of those methods separately. Alternatively, you can putinclude(“util.jl”)
in the top moduleOxygen.jl
and then refer to it internally withusing ..Util
.Anyway, this is a cool project, and many find it useful as I do. I just wish it would have been organised in a layered way where globals would have been put at the outmost layer, and thus, the state initialisation could be simulated with a macro where needed.