TlsExceptionHostPort exceptions with amazonka in stackage

AlexeyRaga commented 8 years ago

I wasn't able to reopen an issue #269 so I am creating a new one.

I am still having this issue with a newest tls-1.3.8:

$ sqs-resurrector -q dead-letter-queue
[HttpException] {
TlsExceptionHostPort (HandshakeFailed (Error_Packet_unexpected "Alert [(AlertLevel_Fatal,IllegalParameter)]" " expected: change cipher")) "sqs.us-west-2.amazonaws.com" 443
}
sqs-resurrector: TransportError (TlsExceptionHostPort (HandshakeFailed (Error_Packet_unexpected "Alert [(AlertLevel_Fatal,IllegalParameter)]" " expected: change cipher")) "sqs.us-west-2.amazonaws.com" 443)

$ stack list-dependencies | grep tls                                                                 ⤬
http-client-tls 0.2.4.1
tls 1.3.8

AlexeyRaga commented 8 years ago

In fact not every single request fails. It works for a while (sometimes 10 seconds, sometimes 5 minutes) and then crashes with this error.

What I am doing is copying messages from one SQS queue to another and sometimes I manage to copy many, sometimes it does almost immediately.

joehillen commented 8 years ago

I'm having TLS errors as well:

TransportError (TlsExceptionHostPort (Terminated True "received fatal error: BadRecordMac" (Error_Protocol ("remote side fatal error",True,BadRecordMac))) "ec2.us-west-2.amazonaws.com" 443)

kim commented 8 years ago

Same here, quite frequently with S3:

TransportError (TlsExceptionHostPort (Terminated True "received fatal error: BadRecordMac" (Error_Protocol ("remote side fatal error",True,BadRecordMac))) "s3-eu-west-1.amazonaws.com" 443)

kim commented 8 years ago

For the record: I do not think both errors are necessarily related.

brendanhay commented 8 years ago

@AlexeyRaga: https://github.com/haskell-works/sqs-resurrector is the application you are running? I'll get it set up and see if I can reproduce.

@kim Both errors are coming from hs-tls, the first is reminiscent of a cipher error - I haven't seen the second before. Any suggestions how to reproduce a minimal example? Are any particular types of request failing?

kim commented 8 years ago

I am basically fetching objects from S3, massaging them and sticking them into DynamoDB. Since this is all a conduit pipeline, it may very well be that the time between successive requests to S3 exceeds 5 seconds -- which we have previously found to be the timeout after which S3 closes idle connections. This leads me to believe that this is just a different incarnation of a problem we have seen before: http-client's Manager has a hardcoded idle limit of 30 seconds, so it may use a connection that is already closed by the remote side. The exception can just be caught on the application layer and the request retried, which will eject the bad connection from the pool and create a fresh one.

Admittedly rather brittle, but the best http-client could do is to allow users to configure the idle timeout. But that doesn't buy us anything if we use the same Manager for different AWS services, which likely have different timeouts.

markhibberd commented 8 years ago

This looks related to this http-client error https://github.com/snoyberg/http-client/pull/226 (see also https://github.com/erikd-ambiata/test-warp-wai/issues/1), had been seeing regular similar issues on corresponding versions.

AlexeyRaga commented 8 years ago

@kim @markhibberd Were you able to solve or work around this issue? It has been a pain in the neck recently :(

markhibberd commented 8 years ago

@AlexeyRaga Our current solution approach is just locking connection down to connection == 0.2.5 and make sure you increase the default amazonka retry policy which is a bit light on - https://github.com/ambiata/mismi/blob/master/mismi-cli/main/s3.hs#L111 is an example of doing so for s3 (just update the s3 service for sqs for you example). If you use stackage you may be out of luck though, no idea if you will be able to work around it.

kim commented 8 years ago

@AlexeyRaga Same here: just retry

brendanhay commented 7 years ago

There appears to be a fix contained in the newest tls version which may address some of the above problems, but, I suspect not all.

The amazonka/amazonka.cabal file in develop now requires tls >= 1.3.9, and the non-GHC8 stack configuration is in the process of being updated.

brendanhay commented 7 years ago

I'm actually not convinced this 'fix' is a good idea, as if it doesn't correct all of the issues it precludes the ability of a downstream user to pick connection == 0.2.5. Will consider, please share any thoughts.

brendanhay commented 7 years ago

Reverting. :disappointed: I'll leave it up to the downstream user to constrain to tls >= 1.3.9.

endgame commented 3 years ago

Marking as a "post 2.0" release, since we'll want to see how things behave after everything lands in develop. Note that tls is up to 1.5.x now.

endgame commented 7 months ago

I'm going to close this off. connection is up to 0.3.1; tls is up to 2.0.2 on Hackage and 1.8.0 on nixpkgs master. I don't think anyone will need/want/be able to pin to >=7-year-old versions of stuff with amazonka-2.x.

brendanhay / amazonka

TlsExceptionHostPort exceptions with amazonka in stackage #311