Use Python Client to stream large data set e.g. data set with 10 M records
🤔 Expected Behavior
Python Client do not give a Connection broken error when streaming large data set.
😯 Current Behavior
It gave such an error.
💁 Possible Solution
Increase timeout? I used curl to stream the large data set as a workaround.
🔦 Context
Transamerica sent us two data sets with a total of 10 M records and wanted us to demonstrate how their data scientists team (live on Domino Lab's jupyter notebook) can interact with Tamr to build pipeline to ingest, export, and analyze data with 10 M records. I was able to stream data set up to 5 M records; however, I need to use curl to stream the 10 M record data set.
💻 Examples
I was able to stream 5M records. But I got an ('Connection broken: IncompleteRead(138 bytes read, 374 more expected)', IncompleteRead(138 bytes read, 374 more expected)) error when I us dt.records() to stream a 10M records data set. The connection broke after 8 minutes. I was able to use curl to get the data set. It would be nice to have the ability to stream large data set in python.
🙋 feature request
Use Python Client to stream large data set e.g. data set with 10 M records
🤔 Expected Behavior
Python Client do not give a Connection broken error when streaming large data set.
😯 Current Behavior
It gave such an error.
💁 Possible Solution
Increase timeout? I used curl to stream the large data set as a workaround.
🔦 Context
Transamerica sent us two data sets with a total of 10 M records and wanted us to demonstrate how their data scientists team (live on Domino Lab's jupyter notebook) can interact with Tamr to build pipeline to ingest, export, and analyze data with 10 M records. I was able to stream data set up to 5 M records; however, I need to use curl to stream the 10 M record data set.
💻 Examples
I was able to stream 5M records. But I got an ('Connection broken: IncompleteRead(138 bytes read, 374 more expected)', IncompleteRead(138 bytes read, 374 more expected)) error when I us dt.records() to stream a 10M records data set. The connection broke after 8 minutes. I was able to use curl to get the data set. It would be nice to have the ability to stream large data set in python.