gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Refine software installation - more granular dependencies #79

Closed nj1973 closed 6 months ago

nj1973 commented 7 months ago

Some of the packages we require to be installed pull down a number of dependencies, an example being the Snowflake client. If we know we are an Oracle/GCP shop it would be nice to not need to install packages related to Snowflake, Synapse or Hadoop.

I wondered if we could do this with sections in the pyproject.toml file. For example the basic installation only installs core dependencies. We then need to additionally install optional-dependencies for the distributions we are interested in. e.g.:

[project.optional-dependencies]
oracle = [
    "cx-Oracle==7.3.0",
]
gcp = [
    "google-cloud-bigquery==3.4.2",
    "google-cloud-kms==2.14.1",
]
hadoop = [
    "hdfs==2.6.0",
    "impyla==0.17.0",
    "thrift-sasl==0.4.3",
]
... etc ...
nj1973 commented 7 months ago

We need to do this pretty soon because installing the hdfs package on requires requests-kerberos which, on Debian 11, caused me to have to install:

sudo apt-get -y install krb5-config libkrb5-dev gcc python3-dev

This is unnecessary seeing as we are limited to a BigQuery target at the moment. This issue will simplify installation greatly.