Closed ndortega closed 4 months ago
Attention: Patch coverage is 98.11321%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 99.20%. Comparing base (
73a8f0a
) to head (40700e8
). Report is 1 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
src/core.jl | 97.43% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @JanisErdmanis,
I have tested both approaches locally on my laptop with Julia 1.6 and 1.10 and found either the same or better performance with these changes. The results of this test almost feel too good to be true which makes me a bit hesitant.
What do you think?
Hi @ndortega
Sorry, I did not see the message. I am unfamiliar with the parallelism internals and don't yet use them; thus, I can't comment on the performance. But I am happy that there is such a feature and may find it helpful in the future. For now, I can only offer some ideas to consider:
serve
and serveparallel
have very similar functional structures and almost the same kwargs. Perhaps it would be wise to combine them into a single serve
where parallelism can be enabled with a keyword argument.startserver,
for which closure is also passed. I would suggest to put startserver
function body into serve
.I am rather busy and can't help with coding.
EDIT: I think I am able to grasp how parallelism is implemented now. I find it concerning that there is a shared mutable state between threads, which is introduced by the metrics history object. A simple fix would be to create a history object for each thread and then aggregate that in a single history object, which can be shown on the metrics page. But perhaps that is unnecessary for CircularDeque
.
Hi @JanisErdmanis,
I'm glad you brought up that point about the shared History object. I've used a ReentrantLock
to make writing and reading from the history object completely threadsafe. That being said, I'll need to look into memorizing the metrics outputs since these calculations can take a while when there's hundreds of thousands of requests and we call the metrics endpoint on an interval
And I agree with your second point, with these latest changes the functions are basically identical to one another and could be merged into a single one
That being said, I'll need to look into memorizing the metrics outputs since these calculations can take a while when there's hundreds of thousands of requests and we call the metrics endpoint on an interval
You could look into implementing a function merge!(history, thread_1_history)
which perhaps can be made fast. I don't think that there should be concern about making the metrics dashboard fast yet as this could complicate matters.
An alternative to using ReentrantLock
which may imply switching between threads is to run the metrics on a single thread only and do the paralelization after that. Now your handler schematically is paralelizer(metrics(router))
but it could be beneficial to rewrite it as metrics(paralelizer(router))
as then you would no longer need to use a ReentrantLock
which should be faster as it does not need synchroniation between tasks.
@JanisErdmanis
I liked the idea of dropping the ReentrantLock
and writing it like metrics(paralelizer(router))
. I reafactored the code to not use any locks and only exclude docs and metrics middleware from getting wrapped with the parallel handler.
Now that the main thread was used for docs & metrics middleware exclusively - I had to make the metrics endpoint was asynchronous in order to prevent the app from halting when viewing the metrics under heavy load. Even after these changes I was surprised to see a 2x slowdown in RPS on my laptop between this approach and the ReentrantLock
approach.
If you're curious and want to view this code let me know and I'll push that branch
I was surprised to see a 2x slowdown
This is unexpected. I wonder if it could be due to tasks also being scheduled on the main thread which could slow it down. Push the code on the branch, I may have a look.
Sure thing, here's the branch I was working out of: https://github.com/OxygenFramework/Oxygen.jl/tree/feature/multi-threading-experimental-changes
Yeah. I suspect that the performance drop is due to tasks being spawned on the primary thread. I the additional computation with metrics may have affected some fairness policy in the scheduler so the main thread gets to host other threads tasks.
You may find interesting to explore ThreadPools.jl.
EDIT: The issue is that the metrics middleware waits for the response response = handler(req)
and thus is blocked to accept new requests.
Exactly, it won't be worth making this refactor until I speed up the metrics calculations. This shouldn't be too difficult since the helper utils are all very similar.
Out of the 7 metrics returned:
Possible Next Steps:
Perhaps you can also consider of using a channel where requests are being but by the main thread and then processed by each individual thread returning HTTPResponse
object in another channel which is the routed back to users. Though I don't know how to do the latter. The benefit of this approach is that it would delegate Julia to schedule tasks. But then again the throughput depends on a single history recording objects and acquiring a lock for the channel...
I think the design decission here needs to be supported by benchmarks. One of the ways is to check how many requests per second it takes to satturate HTTP without metrics and with metrics. If the saturation with metrics is sufficient then it can be shared by every thread and used together with a lock as it is now. Else that needs to be put within each system thread event loop to improve throughput.
found a 10 - 15% performance gain by removing the current queue + multithreading approach and leaning harder on the scheduler built into Julia using a cooperative multitasking strategy.
Additionally, the amount of code to get this working allowed me to completely remove the StreamUtil module, some state variables, and type definitions in favor of a single function - which makes it much easier to maintain.