datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

Cannot upload lots of files in parallel to rawstore #212

Closed anuveyatsu closed 6 years ago

anuveyatsu commented 6 years ago

I have a dataset with 52 resources and when I push it to datahub, it fails at some point with the following error:

Error uploading to rawstore for csv/4_rankings/rankings_2012.csv with code 400 reason <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>65FBF32C03C38720</RequestId><HostId>wf4Ns2frsq8hYufSmQEFSRbyFnuoy17PhuwH172tRWQOuaYuxjIlzh9Ong+6nO4RarX6X6RbtbM=</HostId></Error>

My guess it's due to some limitation of a number of files we can upload at the same time.

Acceptance criteria

Tasks

zelima commented 6 years ago

@anuveyatsu how can I reproduce this? Is that dataset publicly available, Eg on GitHub?

anuveyatsu commented 6 years ago

@zelima here you go https://github.com/anuveyatsu/atp-world-tour-tennis-data

anuveyatsu commented 6 years ago

@zelima but probably all files already on S3 so it wouldn't upload them

zelima commented 6 years ago

Closing as FIXED. We've upgraded clusters and nodes for our services, So should not be an issue anymore. Feel free to reopen if this comes up again