Closed shry15harsh closed 6 years ago
@shry15harsh You can't specify a "folder" since S3 doesn't really have that. However, if you need to reference multiple files in a given package, then you need to create your own manifest file and submit it. There's a doc about this. http://docs.awssolutionsbuilder.com/data-lake/user-guide/working-with-packages/
As for the Parquet format, the data lake solution doesn't care about the file format you use.
Thanks. But Parquet file is itself a folder containing multiple files because it is a splittable file format. How do I store this type of file in Data Lake Package? I want to use this data in my Big Data Processing frameworks so I need splittable data format.
The version 2.0 was published and you can link existing folder content.
When importing data, you must specify a single include path. Ex: If you have bucket-name/folder-name/file-name.ext
and want to include all objects of that bucket, specify just the bucket-name
in the include path.
Does adding bucket-name/parquet_folder/
help in your case?
There is no way to upload a folder or link the existing folder in S3 to Data Lake Package content. My data is in Parquet format. How do I go about this kind of partition formats if I want to use the Data Lake solution?