Closed ivan-safonov closed 5 years ago
Do you have some more details on that? Something like the actual timeout message you are seeing?
@mneudert
At this moment it falls with logs like:
[error] #PID<0.9719.0> running Test.Endpoint terminated
Server: 0.0.0.0:4000 (http)
Request: GET /test
** (exit) exited in: :gen_server.call(:ua_inspector_pool, {:checkout, #Reference<0.4279952074.2567176194.71000>, true}, 5000)
** (EXIT) time out
To reproduce it I made ~7k requests async with task for parsing user agents.
That message looks like an empty worker pool to me.
I suggest you try one of two possible solutions:
UAInspector.Parser.parse("your user agent")
. This should execute the lookup inside your own process. Please be aware that (at least for the moment) this call should be treated as internal and might change unexpectedly, so definitely include several tests to ensure you don't break your application on an update!config :ua_inspector, pool: [max_overflow: 100, size: 1000]
might help you. I can't give you any actual numbers but seeing you have a testing environment for the overload you should be able to get a good estimation on what might/should work. See the poolboy documentation for available options, they are passed unmodified.Both of these should ease the problems you are seeing as long as the real issue is not down at the storage process. There are calls taking place to lookup an ets table identifier (currently not using an outside callable named table) but I don't expect those to issue the message you are seeing.
@mneudert
Thanks a lot!
After changing pool configuration seems logs are much better. But sometimes there are errors also.
The first option doesn't make sense, because we'll need to write our own GenServer in that case. But it will be the same as yours.
The first option eliminates the need for a GenServer as it performs the lookup inside your process currently communicating with the lookup pool. The GenServer itself carries no state so it is only an artificial boundary used for logical separation and more reliable/expectable behaviour. Removing that boundary might work for you but also introduce other unexpected problems...
Btw, how to you test that overload scenario?
Is it "complicated" or as simple as something like this:
for i <- 1..10_000 do
Task.async(fn -> UAInspector.parse("some user agent") end)
end
|> Enum.map(&Task.await/1)
The problem with that is that if your load is high enough there are far too many processes to execute within the (yet to be made configurable) 5 second timeout. If you happen to drain all the resources you have then skipping the pool might be the only way to handle the pressure...
Note: I assume you are already reducing the number of parsing requests by only doing it once for every new session and not on every single request. If there is no session you might want to keep the parsing results in a separate cache to not look up the exact same user agent multiple times.
Thanks for explaining, now we solved this problem just changing UAInspector.parse
to UAInspector.parse_client
. We don't need to save any info if it is a bot.
UPD: Not solved. There are errors sometimes.
Btw, how to you test that overload scenario?
It's really "complicated" way to test, but we isolated it from other processes (as possible).
About cache: yes, we see that it's needed and will add it to project. (We thought it already made inside UAInspector)
Closing due to age and as the removal of process pooling in v1.0.0
should have "some time ago" solved this issue.
Get Timeout of GenServer error when handling large traffic (thousands of queries). Without UAInspector.parse/1 it works fine.