dylex / postgresql-typed

Haskell PostgreSQL library with compile-time type inference
http://hackage.haskell.org/package/postgresql-typed
Other
83 stars 12 forks source link

Add TLS support (#17) #18

Closed albertov closed 5 years ago

albertov commented 5 years ago

This PR implements preliminary TLS support using the tls library (related to #17)

It's preliminary in the sense that I haven't tested it against a real database yet, just experiments issuing the SSLRequest message, performing the SSL handshake and retrieving the server certificate using a test script with the tls library. Will test this tomorrow. There's also no server certificate validation (TBD)

Opening the PR to gather feedback as I'm not sure it's the right approach since:

These issues could be fixed with a contextToHandle :: TLS.Context -> IOMode -> Handle function (similar to socketToHandle) but that requires implementing IODevice TLS.Context and BufferedIO TLS.Context instances to use GHC.IO.Handle mkDuplexHandle on the TLS.Context. I see no way of implementing these correctly without a non-blocking API to read/write on the TLS.Context so I think hiding the implementation under the Handle will just complicate things.

dylex commented 5 years ago

Thanks for this -- it's looking great so far. Switching to Socket is definitely a good thing. I'll try to do some tests on this as well and look a bit more closely later.

Someone explicitly requested non-blocking pgGetNotification at some point. I'd be inclined to leave it in and just have it throw an error for TLS.

dylex commented 5 years ago

Basic, non-tls tests seem to pass fine so far, and based on rough benchmarking I don't see much performance difference -- if anything it's faster without the handle overhead. I do want to look more at the buffering/flushing but I suspect any issue there can be dealt with by tweaking tcp socket options.

albertov commented 5 years ago

Someone explicitly requested non-blocking pgGetNotification at some point. I'd be inclined to leave it in and just have it throw an error for TLS.

Makes sense. I'll look into doing non-blocking reads on a Socket to bring back pgGetNotifications. I couldn't find a way to do it using functions from the network package (please correct me if i'm wrong) so I might need to use a Handle as it was done before for non-TLS. Anyway, will investigate to see if we can keep the Socket as you mentioned it may have a slightly lower overhead

dylex commented 5 years ago

Ah, if it requires going back to Handle that's not worth it, but I don't think it does. The low-level recv call can definitely do non-blocking with the MSG_DONTWAIT or O_NONBLOCK options. I'll look into it -- you can just leave the Socket implementation. (Worst case could use withFdSocket with a simple poll/select.)

albertov commented 5 years ago

I've just tested TLS against our postgres DB at google cloud with a SELECT 1 and it worked! :tada: This is no guarantee that nothing's broken though so some integration tests are in order. How would you recommend implementing them? I was thinking of modifying test/Connect.hs so more parameters can overrided through the environment so the same existing suite can be run against a TLS-enabled DB manually. Sounds good? Or maybe run the suite twice, once with TLS and one without? (this would require a DB with TLS configured).

The low-level recv call can definitely do non-blocking with the MSG_DONTWAIT or O_NONBLOCK options

I was aware of that, in fact network always opens sockets with O_NONBLOCK (else they would block the whole process or would require forkOSed threads) but the blocking happens at the IO manager level. Unfortunately network doesn't seem to provide an API for non-blocking recv (it didn't as of 2010 https://mail.haskell.org/pipermail/haskell-cafe/2010-August/082727.html). I'm sure it can be implemented somehow though (after all, non-blocking reads can be done through a Handle which wraps a socket) but I'm not sure if the magic I believe it'll require belongs here (maybe it does, your call ;) )

dylex commented 5 years ago

More environment settings for running the tests sounds good. (I often just hard-code changes because I'm lazy.) Theoretically stack supports --test-arguments too but that wouldn't work for the template stuff anyway.

Yeah, I feel like I've been down this non-blocking rabbit hole before, too. If you just leave the non-blocking stuff commented out for now I'll think about it and try to figure out if anyone's using it too.

albertov commented 5 years ago

pgGetNotifications are back with a bit of FFI in b45d379

albertov commented 5 years ago

@dylex , I've changed how the db connection is configured in tests to allow enabling TLS and different validation modes (no certificate validation, just validate the certificate is signed by the root CA and also validate the FQHN). I've tested it against a TLS enabled postgresql server and everything seems to work fine so I think this is ready for review. BTW, I have some nix code lying around to create an ephemeral postgresql database while building and running tests for a Haskell package. I can use it to set up integration tests for postgresql-typed and run them on, eg, CircleCI. Would you like another PR for that?

dylex commented 5 years ago

Thanks, I'll try to take a look soon (sorry, busy week).

More automated testing would be great -- I'd certainly take another PR.

dylex commented 5 years ago

This all looks great, and thanks for figuring out the non-blocking recv. I've done a bit more testing without tls and it all looks good.

The one remaining question in my mind is whether tls should be an optional (but default on) build flag. I know the code is littered with ifdefs already for such things, most of which are probably unnecessary, but I like to avoid making people pull in a lot of extra dependencies if they don't need them. I'm happy to do the work if you think that's reasonable, but also generally ready to merge otherwise.

albertov commented 5 years ago

Makes sense to make TLS support optional as it brings lots of dependencies. I'll sprinkle the #ifdefs today. I'm also going to add the nix CI stuff in this PR to make sure all combinations of compiled TLS support and TLS on/off are tested, can split it up in two PRs later if you want

albertov commented 5 years ago

Made TLS optional at compile time, added integration tests with nix and configured CI on Travis. Used travis since it has a "nix" builder pre-configured. I have a docker image to build nix stuff on CircleCI that could configured instead but decided on Travis to reduce the number of moving parts, can change this if needed...

One thing I've noticed when running the TLS tests (with nix-build nix/release.nix -A postgresql-typed) is that they run much slower than the non-TLS ones (runnable with nix-build nix/release.nix -A postgresql-typed-notls). I suspect there's a pgFlush missing somewhere but not familiar enough with PG's wire protocol to confirm (in a reasonable amount of time). The DB server stderr is dumped to the same terminal tests run... Maybe this gives a clue?

Test suite hdbc: RUNNING...
ERROR:  syntax error at or near "INVALID" at character 30
STATEMENT:  CREATE TABLE valid1 (a int); INVALID
(... pauses here for quite a bit ...)
ERROR:  column "invalidcol" does not exist at character 8
STATEMENT:  SELECT invalidcol FROM hdbctest2
(... pauses here for quite a bit ...)
Test suite hdbc: PASS
Test suite logged to: dist/test/postgresql-typed-0.5.3.0-hdbc.log
albertov commented 5 years ago

For the record, first passing build at CI: https://travis-ci.org/Feeld/postgresql-typed/builds/535728362

dylex commented 5 years ago

I've been looking at the performance some. I don't think it's specific to TLS, at least from what I see. For me, running the tests with a local socket takes about 3.3s, with TCP to localhost takes about 70s, and TCP with TLS takes about 78s. For comparison, on the current master, local socket takes 3.3s and TCP is 3.5s. So, most of the introduced overhead probably comes from the switch from Handle to Socket, for some reason. Running with PG_DEBUG=1 and looking at timestamps (which I should probably commit) looks like the delay is throughout the connection, but mainly around Sync. I'll keep poking at the non-TLS version and see what I can do, which will presumably help TLS too. (It isn't clear to me whether your nix tests are using sockets or localhost.)

dylex commented 5 years ago

Ah, it's an easy fix, which I've now pushed to your branch (hope you don't mind): TCP_NODELAY. Now TCP is back to 3.5s, and TLS is 5.5~6s, which seems like reasonable overhead to me.

I'll set up travis soon and GTM.

albertov commented 5 years ago

Great, thanks for finding and fixing the issue! Let me know if there's anything else I need to do before merging