amazon-archives / data-pipeline-samples

This repository hosts sample pipelines
MIT No Attribution
464 stars 269 forks source link

DynamoDBImportCSV : CSV file format #62

Open rtrompier opened 7 years ago

rtrompier commented 7 years ago

Hi guys,

Please, can you tell me what is the correct CSV format for the script DynamoDBImportCSV. Comma separated only ? Headers are mandatory ?

Thanks for your answer ;) Cheers

Namitha08 commented 7 years ago

You might have figured it out by now :) But for people who might need to know as why their operations are failing.Below is the Explanation. Assuming you are using CopyActivity and S3DataNode. CopyActivity has specific limitations to its CSV support. When you use an S3DataNode as input for CopyActivity, you can only use a Unix/Linux variant of the CSV data file format for the Amazon S3 input and output fields. The Unix/Linux variant requires the following:

The separator must be the "," (comma) character. The records are not quoted. The default escape character is ASCII value 92 (backslash). The end of record identifier is ASCII value 10 (or "\n"). Windows-based systems typically use a different end-of-record character sequence: a carriage return and line feed together (ASCII value 13 and ASCII value 10). You must accommodate this difference using an additional mechanism, such as a pre-copy script to modify the input data, to ensure that CopyActivity can properly detect the end of a record; otherwise, the CopyActivity fails repeatedly.