capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.41k stars 157 forks source link

Can't get the full package to work #1144

Closed DylanVig closed 2 months ago

DylanVig commented 2 months ago

I am trying to install the full package rather than the slimmer package, and it says that when I do pip install 'DataProfiler[ml]', all of the requirements are satisfied, but when I run a script that I wrote, I get this as part of the result: /Users/dylanvig/venv3/lib/python3.12/site-packages/dataprofiler/profilers/profile_builder.py:757: RuntimeWarning:

!!! WARNING Partial Profiler Failure !!!

Profiling Type: data_labeler Exception: TypeError Message: Metric.add_weight() got multiple values for argument 'shape'

For labeler errors, try installing the extra ml requirements via:

$ pip install dataprofiler[ml] --user

profiler_utils.warn_on_profile("data_labeler", e) INFO:DataProfiler.profilers.profile_builder: Finding the Null values in the columns...

Do you have any idea what is going on? Thanks!

taylorfturner commented 2 months ago

Hey @DylanVig! This should be resolved in 0.11.0 with the max version tag for tensorflow. Did you run pip install dataprofiler[ml] --user?

In addition, keras will be upgraded as part of #1138.

DylanVig commented 2 months ago

Hey Taylor! I ran pip install 'dataprofiler[ml]', which should've gotten the job done if I'm not mistaken. I believe that my error lies in how it automatically installed versions of tensorflow (2.16.1) and keras (3.3.3) that are not suitable for the full package of DataProfiler. I am also using python version 3.12.0. What python version is most suitable for running this package, as well as versions for tensorflow and keras? Thanks!

taylorfturner commented 2 months ago

what version of dataprofiler as you running? It should not be installing 2.16.1 (ref) if you are on 0.11.0

DylanVig commented 2 months ago

I got it working. I was using a python version that was incompatible with dataprofiler, but after I switched to python 3.9, I got it working with suitable versions of tensorflow and keras. Thanks!

taylorfturner commented 2 months ago

Nice! what version were you on prior?

DylanVig commented 2 months ago

I was using python 3.12.0, and whenever I installed the package, it would install tensorflow 2.16.1 and keras 3.3.3. The suitable versions of tensorflow and keras were not compatible with that version of python I don't think, so I just had to switch to a slightly older version

taylorfturner commented 2 months ago

Ah got it! Yeah we are hoping to add 3.11 in this next release of the library to our GHA checks and tox runs.

Glad its working for you, @DylanVig!