elli-lib / elli

Simple, robust and performant Erlang web server
https://github.com/elli-lib/elli/blob/develop/doc/README.md
MIT License
314 stars 37 forks source link

elli_tcp: cast 'accepted' AFTER TLS handshake succeeded #90

Closed sg2342 closed 3 years ago

sg2342 commented 3 years ago

fix #84

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-0.03%) to 75.787% when pulling c919553bf1e214df69e564342e95c09e663f73a7 on sg2342:fix_tls_accept_logic into 968afee385f46c053fd46858713261db7dc03af4 on elli-lib:main.

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.7%) to 76.471% when pulling d6f6d5049cc46622699abf78060848dc155c8e9f on sg2342:fix_tls_accept_logic into 968afee385f46c053fd46858713261db7dc03af4 on elli-lib:main.

yurrriq commented 3 years ago

Thanks. This looks good to me. Would you be willing to add a test or two? Ideally they would fail before this patch and succeed after.

sg2342 commented 3 years ago

Thanks. This looks good to me. Would you be willing to add a test or two? Ideally they would fail before this patch and succeed after.

Not done yet. I have to find a solution for OTP-20 and older where ssl:connect/3 in acceptor_leak_regression succeeds despite the verify and verify_fun options.

sg2342 commented 3 years ago

OTP-20 and older reuse TLS sessions even when the (verify) connect options have changed. Since other successful tests might leave a reusable TLS session, one has to set reuse_sessions false to provoke the CLIENT ALERT needed to trigger the bug.

elli_ssl_tests:acceptor_leak_regression/0 will succeed (on all supported OTP releases) with the fixed elli_tcp accept logic. It will fail (on all supported OTP releases) without the fix.

BTW: rebar3 does start ssl by default which is why the application:start/1 and application:stop/1 calls in elli_ssl_tests are unnecessary. Removing them reduces the error_logger output of elli_ssl_tests from

=INFO REPORT==== 30-Sep-2020::23:02:01 ===
TLS client: In state certify at ssl_handshake.erl:1293 generated CLIENT ALERT: Fatal - Bad Certificate

=ERROR REPORT==== 30-Sep-2020::23:02:01 ===
Elli request (pid #Port<0.72536>) unexpectedly crashed:
shutdown

=ERROR REPORT==== 30-Sep-2020::23:02:01 ===
Elli request (pid <0.1649.0>) unexpectedly crashed:
{shutdown,{gen_statem,call,[<0.1687.0>,{recv,0,60000},infinity]}}

=INFO REPORT==== 30-Sep-2020::23:02:01 ===
TLS server: In state certify received CLIENT ALERT: Fatal - Bad Certificate

=INFO REPORT==== 30-Sep-2020::23:02:01 ===
    application: ssl
    exited: stopped
    type: temporary

=INFO REPORT==== 30-Sep-2020::23:02:01 ===
    application: public_key
    exited: stopped
    type: temporary

=INFO REPORT==== 30-Sep-2020::23:02:01 ===
    application: crypto
    exited: stopped
    type: temporary

to

=NOTICE REPORT==== 30-Sep-2020::23:21:13.058984 ===
TLS client: In state wait_cert_cr at ssl_handshake.erl:1948 generated CLIENT ALERT: Fatal - Bad Certificate

=NOTICE REPORT==== 30-Sep-2020::23:21:13.059186 ===
TLS server: In state wait_finished received CLIENT ALERT: Fatal - Bad Certificate
yurrriq commented 3 years ago

BTW: rebar3 does start ssl by default which is why the application:start/1 and application:stop/1 calls in elli_ssl_tests are unnecessary. Removing them reduces the error_logger output of elli_ssl_tests

Maybe we should prefer application:ensure_started/1 and remove the application:stop/1 calls.

yurrriq commented 3 years ago

I say we merge this and patch the application start/stop shortly after. I'll add this to my Wednesday evening session too heh

yurrriq commented 3 years ago

This looks good to me, and I've confirmed elli_ssl_tests:acceptor_leak_regression/0 fails on the main branch, and succeeds with c919553bf1e214df69e564342e95c09e663f73a7.

Given that rebar3 starts crypto, asn1, public_key, and ssl, perhaps we ought not bother starting (and stopping) them.

What do you think, @tsloughter?

tsloughter commented 3 years ago

Using ensure_all_started and not stopping those particular apps is probably a good idea.

I've wanted to move tests to Common Test since I think it has much clearer semantics for all this (and better error messages for where a failure happened in the tests).. but it would be a lot of work probably.

tsloughter commented 3 years ago

Maybe I'll start with adding a common test suite and over time cases can be moved to it.

yurrriq commented 3 years ago

Merging this for now then, and we can handle the move to CT in another (series of) PR(s). Thanks, @sg2342!