This PR introduces the new S3 connection feature to the DataProfiler repository. It enables DataProfiler to read data directly from remote S3 paths (s3_uri), enhancing its flexibility and data source compatibility.
Changes Made:
Added S3Helper class to facilitate S3 connectivity for DataProfiler.
The class accommodates various scenarios:
Accepting input parameters for AWS access key, secret key, session token, and region name.
Utilizing environment variables for AWS credentials.
Added a new unit test test_s3_helper.py module to ensure the functionality of the new S3 connection feature. Also, enhanced the existing test_data.pyand test_data_utils.py unit tests.
Details:
create_s3_connection: The new function is introduced to create an S3 connection for DataProfiler. It provides flexibility in setting AWS credentials and obtaining IAM permissions for various use cases.
Input Parameters: The function accepts input parameters for AWS access key, secret key, session token, and region name. This allows for explicit credential provisioning.
Environment Variables: In cases where input parameters are not provided, the function falls back to using environment variables for AWS credentials. This provides an alternative for setting credentials.
Unit Test Added (TestS3Helper):
test_s3_connection: A new unit test has been added to ensure the functionality and correctness of the create_s3_connection function. This test covers various scenarios, including different input parameter combinations.
Data Class Testing: The Data class has been tested successfully to load various data files using S3 URIs in the editable_repo. This demonstrates the practical usability of the S3 connection feature.
This PR enhances the S3 connectivity of DataProfiler, making it more versatile in handling different AWS credential scenarios. The unit test (test_s3_connection.py, test_data.py and test_data_utils.py) and executing a number of Data load operations for various data types (i.e. CSV, Parquet, TXT and JSON) by installing the library in editable mode validate the functionality.
Please review and test the changes, and let us know your feedback.
Pull Request Summary:
This PR introduces the new S3 connection feature to the DataProfiler repository. It enables DataProfiler to read data directly from remote S3 paths (s3_uri), enhancing its flexibility and data source compatibility.
Changes Made:
Added
S3Helper
class to facilitate S3 connectivity for DataProfiler.The class accommodates various scenarios:
Added a new unit test
test_s3_helper.py
module to ensure the functionality of the new S3 connection feature. Also, enhanced the existingtest_data.py
andtest_data_utils.py
unit tests.Details:
create_s3_connection: The new function is introduced to create an S3 connection for DataProfiler. It provides flexibility in setting AWS credentials and obtaining IAM permissions for various use cases.
Input Parameters: The function accepts input parameters for AWS access key, secret key, session token, and region name. This allows for explicit credential provisioning.
Environment Variables: In cases where input parameters are not provided, the function falls back to using environment variables for AWS credentials. This provides an alternative for setting credentials.
Unit Test Added (TestS3Helper):
test_s3_connection: A new unit test has been added to ensure the functionality and correctness of the
create_s3_connection
function. This test covers various scenarios, including different input parameter combinations.Data Class Testing: The
Data
class has been tested successfully to load various data files using S3 URIs in theeditable_repo
. This demonstrates the practical usability of the S3 connection feature.This PR enhances the S3 connectivity of DataProfiler, making it more versatile in handling different AWS credential scenarios. The unit test (
test_s3_connection.py, test_data.py and test_data_utils.py
) and executing a number of Data load operations for various data types (i.e.CSV, Parquet, TXT and JSON
) by installing the library in editable mode validate the functionality.Please review and test the changes, and let us know your feedback.