elixir-inspector / ua_inspector

User agent parser library
Apache License 2.0
125 stars 23 forks source link

Crash on large traffic #11

Closed ivan-safonov closed 5 years ago

ivan-safonov commented 7 years ago

Get Timeout of GenServer error when handling large traffic (thousands of queries). Without UAInspector.parse/1 it works fine.

mneudert commented 7 years ago

Do you have some more details on that? Something like the actual timeout message you are seeing?

general-CbIC commented 7 years ago

@mneudert

At this moment it falls with logs like:

[error] #PID<0.9719.0> running Test.Endpoint terminated
Server: 0.0.0.0:4000 (http)
Request: GET /test
** (exit) exited in: :gen_server.call(:ua_inspector_pool, {:checkout, #Reference<0.4279952074.2567176194.71000>, true}, 5000)
    ** (EXIT) time out

To reproduce it I made ~7k requests async with task for parsing user agents.

mneudert commented 7 years ago

That message looks like an empty worker pool to me.

I suggest you try one of two possible solutions:

  1. Circumvent pooling by directly calling UAInspector.Parser.parse("your user agent"). This should execute the lookup inside your own process. Please be aware that (at least for the moment) this call should be treated as internal and might change unexpectedly, so definitely include several tests to ensure you don't break your application on an update!
  2. Pump up the pool worker numbers (you indirectly made me aware of a chunk of missing documentation on that...). Adding config :ua_inspector, pool: [max_overflow: 100, size: 1000] might help you. I can't give you any actual numbers but seeing you have a testing environment for the overload you should be able to get a good estimation on what might/should work. See the poolboy documentation for available options, they are passed unmodified.

Both of these should ease the problems you are seeing as long as the real issue is not down at the storage process. There are calls taking place to lookup an ets table identifier (currently not using an outside callable named table) but I don't expect those to issue the message you are seeing.

general-CbIC commented 7 years ago

@mneudert

Thanks a lot!

After changing pool configuration seems logs are much better. But sometimes there are errors also.

The first option doesn't make sense, because we'll need to write our own GenServer in that case. But it will be the same as yours.

mneudert commented 7 years ago

The first option eliminates the need for a GenServer as it performs the lookup inside your process currently communicating with the lookup pool. The GenServer itself carries no state so it is only an artificial boundary used for logical separation and more reliable/expectable behaviour. Removing that boundary might work for you but also introduce other unexpected problems...


Btw, how to you test that overload scenario?

Is it "complicated" or as simple as something like this:

for i <- 1..10_000 do
  Task.async(fn -> UAInspector.parse("some user agent") end)
end
|> Enum.map(&Task.await/1)

The problem with that is that if your load is high enough there are far too many processes to execute within the (yet to be made configurable) 5 second timeout. If you happen to drain all the resources you have then skipping the pool might be the only way to handle the pressure...

Note: I assume you are already reducing the number of parsing requests by only doing it once for every new session and not on every single request. If there is no session you might want to keep the parsing results in a separate cache to not look up the exact same user agent multiple times.

general-CbIC commented 7 years ago

Thanks for explaining, now we solved this problem just changing UAInspector.parse to UAInspector.parse_client. We don't need to save any info if it is a bot.

UPD: Not solved. There are errors sometimes.


Btw, how to you test that overload scenario?

It's really "complicated" way to test, but we isolated it from other processes (as possible).


About cache: yes, we see that it's needed and will add it to project. (We thought it already made inside UAInspector)

mneudert commented 5 years ago

Closing due to age and as the removal of process pooling in v1.0.0 should have "some time ago" solved this issue.