aws-solutions / aws-data-lake-solution

A deployable reference implementation intended to address pain points around conceptualizing data lake architectures that automatically configures the core AWS services necessary to easily tag, search, share, and govern specific subsets of data across a business or with other external businesses.
https://aws.amazon.com/solutions/implementations/data-lake-solution/
Apache License 2.0
401 stars 160 forks source link

"Delete a package" documentation gives conflicting info #17

Closed emma-ehrhardt closed 6 years ago

emma-ehrhardt commented 6 years ago

http://docs.awssolutionsbuilder.com/data-lake/user-guide/working-with-packages/#delete-a-package for version 2.0 has conflicting information.

The text says: "Deleting a package will remove it from the data lake, it will not delete any files from Amazon S3." The screenshot shows: "Deleting this package will remove this entry from the data lake and delete the dataset files from Amazon S3.

I believe the text reflects earlier Data Lake Solution behavior, and should be corrected ("it will not delete" > "and will delete"), as deleting a package does indeed now remove the files from S3 as well.

hvital commented 6 years ago

Thanks for the heads up Emma. We gonna update the text.

Be aware that the screenshot is the expected behavior. Deleting a package will remove it's entry from the data lake and delete the dataset files from Amazon S3. This only happens for the datasets you upload to the datalake solution. I mean, include paths (the ones you import via manifest files) are just a reference to the real location - the solution does not delete the original file.

emma-ehrhardt commented 6 years ago

Is this the correct place to post about issues in the docs? Or are those pages in the codebase where I can directly do a pull request?

hvital commented 6 years ago

Yes, here is good! The online help text is not in github repo.

shsenior commented 6 years ago

Resolved in v2.1.0 update.