capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.42k stars 158 forks source link

Update requirements so S3 is an optional package to reduce package bloat. #1086

Open JGSweets opened 7 months ago

JGSweets commented 7 months ago

Is your feature request related to a problem? Please describe. Currently, boto3 is installed as a default package in the DataProfiler. I suggest adding it as an optional package such that it can be installed only if desired to use s3.

Might be beneficial for things like parque as well due to package size.

Describe the outcome you'd like:

pip install dataprofiler  # doesn't install boto3 + req packages for it

pip install 'dataprofiler[s3]'  # installs boto3 + req packages for it

Additional context: This can limit the size of docker images or lambda jobs requiring DataProfiler.

taylorfturner commented 7 months ago

Thanks @JGSweets!