Closed rnijveld closed 4 years ago
Can you explain a bit what happens in the case of this failure with your fix? Does the request get retried properly? I would worry that we may be swallowing an error that would cause data not to be written - if a data file cannot be written as expected, then crashing would preferable.
Curl will still pick up on the failed write using their own error handling (also see https://curl.haxx.se/mail/lib-2010-11/0009.html). Normally it has signal handling to catch the SIGPIPE signal and continue with normal error handling, but as signal handling is disabled in entwine we can't use that. But it still has its own internal signal handling, as it will still see the PIPE error code from the eventual send
call.
The code in arbiter::http::Curl::perform
won't get a CURLE_OK
return code (I think it should return CURLE_SEND_ERROR
instead looking at the error handling code). This in turn will set the httpCode to 500 and cause ok()
to return false on the response, causing an exception. entwine::ensurePut
will then just keep retrying until a successful put request was completed. I haven't seen any missing data in our pointcloud on gcloud after using this patch either, so that gives me some more confidence that this works as intended.
I'm not sure what the best practice for Windows would be though, as it doesn't have the same signal handling, it might not throw any signal on windows, so I think it would be fine to only add the sigpipe ignore code on platforms that support it.
We've noticed that when running builds commands on a GCS output sometimes TLS connections will fail, and the next write will cause a SIGPIPE. Curl is configured to not handle any signals, causing entwine to crash. We applied this patch to prevent crashing, but I'm not sure if this is the best way to handle it.