Result was on the console for huge records

edasque / DynamoDBtoCSV

Dump DynamoDB data into a CSV file

Apache License 2.0

471 stars 152 forks source link

Result was on the console for huge records #29

Open tharun06 opened 7 years ago

tharun06 commented 7 years ago

For small tables it worked fine and when I tried on a table around 50,000 rows the output was on the console and csv file was empty.

edasque commented 7 years ago

I no longer have access to a dynamodb access. When I did, it was millions of rows long so I am surprised. If do get access to a large db again, I'll look into this.

shubho-acc commented 6 years ago

I am facing same issue. Unable to fetch 65000 records from same table.

edasque commented 6 years ago

Are you though? Do you see the output on the console?

shubho-acc commented 6 years ago

No i dont see any error or log, i had to wait for like couple of mints but when i saw output file it was blank. I tested on 8GB server.

ghost commented 6 years ago

Why not migrate to using a dataFrame and exporting to .csv from there. (most libraries have 'low memory' flag to avoid loading everything into memory)

PS: I'm a python guy so here is my little contribution

tcchau commented 6 years ago

@shubho-acc @tharun06 This is related to changes I made to be able to handle tables where the "schema" is not fixed. Essentially we have to read all the records to be able to figure out what the headers should be. We can create a new mode where if you know the schema is fixed, then the output can be streamed as we receive them, rather than being stored in memory, which is what is causing the problem. Reply to this thread if you're still interested in seeing a fix for this.

ghost commented 6 years ago

@tcchau Why not add a simple flag for fixed schema that would just read the schema on the first line and generate based on that

tcchau commented 6 years ago

@MarcoPorracin Yes, that's exactly what I mean. The new mode of operation would be triggered by the flag.