cacerts vs cacertfile - Githubissues

Hinidu commented 7 years ago

Hello! Some times ago I've started to use ExAws with hackney to make many concurrent long polling requests to AWS. I found that it consumes about 3 MB per request which I think is quite big. After that I've started to search the source of this problem. Today I found that the problem was in ssl verification. I did the same test using hackney and httpc and got very similar results - about 200 MB for 500 concurrent requests when I provide a path to cacertfile downloaded from certifi.io and put into priv directory and about 1.5 GB for the same test when I provide cacerts with the value from certifi.cacerts() instead of cacertfile.

I've tried to find what's the cause of this strange behavior because intuitively preloaded and parsed in compile-time certifi.cacerts() looks more optimal. That's what I found in OTP source code: 1) We choose one option or another here 2) Already do some work if we got cacerts 3) Call add_trusted_certs 4) That is a simple proxy 5) Then here we get the biggest difference - for cacerts we transform them again but for cacertfile we use the cache based on the filename.

I didn't think about different possible solutions yet but the simplest that comes to my mind is: 1) Put cacerts.pem from certifi.io and provide a function to get the path to that file. 2) Use cacertfile instead of cacerts in hackney and other client libraries.

What do you think about that? Thank you in advance.

Hinidu commented 7 years ago

It looks like python's implementation works exactly this way:

>>> import certifi

>>> certifi.where()
'/usr/local/lib/python2.7/site-packages/certifi/cacert.pem'

And there is no other functions at all.

Hinidu commented 7 years ago

Hello! Could anybody provide some feedback on this issue? I think that I will make a PR soon.

ericmj commented 7 years ago

Wouldn't it be better to improve the caching in OTP to also cache for terms instead of just files?

benoitc commented 7 years ago

i think that would be the best @ericmj . in between we can offer the 2 solutions i guess. The only issue is that it would make the packaging bigger

Hinidu commented 7 years ago

Wouldn't it be better to improve the caching in OTP to also cache for terms instead of just files?

@ericmj I'm not sure. I'm not very deep in Erlang but I think that cache for terms like that can't be good from the performance point of view. When we want to check whether the term is present in cache or not we should at least calculate it's hash and then compare the entire term with the term found in cache. But these terms are quite big - cacerts.pem from this repository is 289 KB, so I'm not sure that this cache can be fast enough. I assume that immutability will help in case when terms are equal because VM could compare term references before full comparison, but it won't help when terms are different or the same but with different references. Actually I think that the absence of cache on terms can be intentional by that reason.

In between we can offer the 2 solutions i guess.

It would be great! I will try to make a PR tomorrow.

The only issue is that it would make the packaging bigger.

Perhaps it could be a major update with only file option to avoid it? Does many people use that library directly (not through hackney or another http client)?

ericmj commented 7 years ago

Difficult to say if it would be slow without measuring.

I'm not a fan of doubling the size of the package though.

Hinidu commented 7 years ago

@ericmj I've made a microbenchmark using ETS public table on set data structure just like in ssl_pkix_db.erl. That's the output of mix bench:

Settings:
  duration:      1.0 s

## CacheBench
[20:23:21] 1/4: failed filename lookup
[20:23:30] 2/4: failed term lookup
[20:23:33] 3/4: successful filename lookup
[20:23:34] 4/4: successful term lookup

Finished in 14.44 seconds

## CacheBench
benchmark name              iterations   average time 
failed filename lookup       100000000   0.08 µs/op
successful filename lookup    10000000   0.13 µs/op
failed term lookup               20000   90.67 µs/op
successful term lookup           10000   102.44 µs/op

100 µs is perhaps still fast enough taking into account that there are many other operations that needs to be done to establish SSL connection but it means that we will never be able to create more than 10000 connections in one second (at least on my machine). And I don't see the reason to sacrifice these performance gains just for optimization of one-time per application lifetime operation like reading and parsing of cacerts.pem file.

Hinidu commented 7 years ago

Forgot to mention that I'm using Elixir 1.5.0 and Erlang/OTP 20 on Core i5-4590 3.30GHz for benchmarking.

Hinidu commented 7 years ago

I understand that for embedded stuff +250KB could be a problem. But to avoid it we can just release version 3.0 without the old way of getting cacerts and the package size will remain almost the same.

benoitc commented 7 years ago

another way to think of it is why not caching the term directly on load ? Wonder if it's easily feasible

ericmj commented 7 years ago

I think it would be better to only include the file in 3.0 then to include both now.

@benoitc Do you mean cache on load in OTP?

benoitc commented 7 years ago

@ericmj yes

Hinidu commented 7 years ago

@benoitc sorry, I'm not sure that I'm understand what do you mean. OTP caches terms which it loads from files itself. Caching of user-provided terms is much more expensive as I stated above.

benoitc commented 7 years ago

@Hinidu i mean what you have included are pem directly as bin, if on load in a gen_server you dcode them and place it in ets like you do with the file, i don't see why it wouldn't work?. And maybe it worth to deliver the plumbing for it.

Hinidu commented 7 years ago

@benoitc it would work but it would be slow because we need a key to look in ETS table and if this key is the entire cacerts term (that is quite big) then the lookup is much slower then lookup by filename.

ericmj commented 7 years ago

@Hinidu You don't have to look up with the term as key. If OTP had an API for adding new terms to the cache you could also chose the key in that API.

benoitc commented 7 years ago

Just had a quick look at the SSL applications and maybe just filling the cache via the API in

https://github.com/erlang/otp/blob/master/lib/ssl/src/ssl_pem_cache.erl

is enough?

Hinidu commented 7 years ago

@ericmj Ok, it would be possible, but who would call that API? certifi.cacerts will be converted to call to gen_server that will add cacerts to the cache in its init? Or users of certifi will handle it themselves? @benoitc I see only insert/1 that takes filename.

benoitc commented 7 years ago

@Hinidu sorry i've not been precise. But if you lookat the insert code you will see:

https://github.com/erlang/otp/blob/master/lib/ssl/src/ssl_pem_cache.erl#L104

Ie. it pass a filename and it's decoded content. I guess we could probably pass directly any content even a filtered one to it.

ericmj commented 7 years ago

@Hinidu Why does it have to go through a genserver? It can be a function that certifi provides.

Hinidu commented 7 years ago

@benoitc I don't know whether it's ok or not to use private API of ssl_pem_cache. @ericmj it was just a thought on how to do that without changing certifi API so every user would have a better performance without any modifications in their code. But of course we can add another function for that. I see these problems with "another function" approach: 1) Users should take care about that themeselves (perhaps it could be easily handled in hackney at application startup, in that case this problem doesn't bother me too much). 2) We need to use private API or patch OTP itself so end users will get that change not soon enough and certifi will work only with the last version of OTP (or such version of OTP that has that private API if we will choose the first option).

The problems with "file" approach: 1) This should be 3.0. 2) We lose a little bit of performance for the very first request.

I'm still thinking that "file" approach is better but maybe I don't get something important. I could try to elaborate my arguments if I'm not clear enough.

Hinidu commented 7 years ago

Looks like we can’t hide the act of adding pem to the cache in hackney because it is optional and somebody can use hackney with their own certificates. Perhaps it could be controlled by configuration option of hackney or certifi. So it’s one more reason why I think that just providing a file would be better.

benoitc commented 7 years ago

@Hinidu which version of Erlang are you using anyway ?

It seems odd for me that the actual result is not actually shared by the processes. There are indeed 2 original reasons/goals in using a precompiled file (like the G package does too):

ease of embedding it in escript (like rebar3)
removing caching, by reusing the precompiled terms

The compiled terms : https://github.com/certifi/erlang-certifi/blob/master/src/certifi_pt.erl#L15

are identical to those cached in ETS: https://github.com/erlang/otp/blob/master/lib/ssl/src/ssl_pem_cache.erl#L99

Caching the term must actually have no performance change since it's the same as what is actually done by OTP when reading the file and caching its content. It should in theory even faster and less consuming since it should be shared across processes. Passing from ETS to the process is actually copying the content to the process requesting .

Maybe ssl_pkix_db.erl is doing an extra step . I will need to check.

Nothing prevent us to ship the file as a supplement anyway though in theory reusing a precompiled term should be more efficient...

Hinidu commented 7 years ago

@Hinidu which version of Erlang are you using anyway ?

@benoitc I'm using OTP 20.

like the G package does too

What is the G package? It's hard to google it...

Caching the term must actually have no performance change since it's the same as what is actually done by OTP when reading the file and caching its content. It should in theory even faster and less consuming since it should be shared across processes. Passing from ETS to the process is actually copying the content to the process requesting . Maybe ssl_pkix_db.erl is doing an extra step . I will need to check.

You can follow the 5 steps in my original message. I've missed some initial steps from hackney to ssl to tls_connection to ssl_connection and then to ssl_config. I will try to post the whole "stacktrace" later to show how it works.

Nothing prevent us to ship the file as a supplement

Great! I will try to make a PR now.

anyway though in theory reusing a precompiled term should be more efficient...

It's the best thing about this issue - it's counterintuitive. I've made a simple benchmark:

defmodule HackneyBench.Application do
  @url "https://google.com"

  use Application

  def start(_type, _args) do
    Enum.each(1..500, fn _ -> spawn_link(&cacerts/0) end)
    # Enum.each(1..500, fn _ -> spawn_link(&cacertfile/0) end)
    Supervisor.start_link([], strategy: :one_for_one, name: C1Accounts.Supervisor)
  end

  def cacerts() do
    {:ok, status, _, _} = :hackney.request(
      :get, @url, [], <<>>, [with_body: true, pool: false, ssl_options: [cacerts: :certifi.cacerts()]])
    cacerts()
  end

  def cacertfile() do
    {:ok, status, _, _} = :hackney.request(
      :get, @url, [], <<>>, [with_body: true, pool: false, ssl_options: [cacertfile: "priv/cacerts.pem"]])
    cacertfile()
  end
end

With hackney 1.9.0 if I run application that has 500 workers which makes concurrent requests forever with specified cacertfile then I get less than 200 MB memory consumption and about 75% CPU load. If I do the same with specified cacerts from certifi I get about 1 GB memory consumption and about 95% CPU load.

Hinidu commented 7 years ago

I've made the pull request: https://github.com/certifi/erlang-certifi/pull/22

certifi / erlang-certifi

cacerts vs cacertfile #19