Large CSV downloads time out

dolthub / dolthub-issues

Issues for dolthub.com

4 stars 1 forks source link

curl -s https://www.dolthub.com/csv/dolthub/museum-collections/main/objects | pv > objects.csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 9348M 0 9348M 0 0 10.2M 0 --:--:-- 0:15:13 --:--:-- 7274k curl: (18) transfer closed with outstanding read data remaining

@noamross Thanks for this bug report. We were able to reproduce the observed behavior.

We found a stream idle timeout combined with some decently aggressive response buffering in our routing infrastructure. The end result is that a large portion of the response can get buffered, then streamed to the requesting client over time, and the upstream which already sent that portion of the response and is waiting for the window to open back up can see a stream idle timeout.

We have reduced the internal buffering, which was not intended behavior, and we have increased the stream idle timeout.

Things work better for me now in my local testing.

It's worth noting in regards to this: our infrastructure doesn't have super lenient connection draining policies during things like deployments, so it's still possible a request which runs for 15 minute will see spurious disconnects. But they definitely shouldn't be deterministic now.

Maybe a feature request would be to add export table CSV to S3, which could be done in parallel and would provide a resumable download URL. Or adding functionality to dolt cli that could dump a table in CSV form directly from a dotlhub repository; that could also support resumption.

For now I will close this issue, but feel free to re-open if you still see this behavior.

dolthub / dolthub-issues

Large CSV downloads time out #303