databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
https://databrickslabs.github.io/dbldatagen
Other
364 stars 61 forks source link

Setuptools need to include required packages for working with the library locally (outside of the databricks environment) #156

Closed malisezer closed 1 year ago

malisezer commented 1 year ago

Expected Behavior

when the package is installed, I expect it to install the the necessary dependendices.

Current Behavior

It does not at the moment, it assumes that this package is going to run on databricks which is going to be the case, however if I am developing code locally, it becomes a problem during testing.

Steps to Reproduce (for bugs)

pip install dbldatagen

Then try to run some code locally for example tests.

It does not install the required packages: numpy = "1.22.0" pyspark = "3.1.3" pyarrow = "1.0.1" pandas = "1.1.3" pyparsing = ">=2.4.7,<3.0.9"

Context

Your Environment

local mac computer

ronanstokes-db commented 1 year ago

We use both conda and pipenv environments during development and test of the library - so both are supported.

The makefile has make targets to setup the environments

make create-dev-env

will make a conda based environment.

Once you activate the environment, you should then run `make install-dev-dependencies' to install the dev_requires.

Alternatively - use make buildenv to setup a pipenv based environment.

ronanstokes-db commented 1 year ago

READMe updated to reflect comments above