gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Redefine software installation #31

Closed nj1973 closed 8 months ago

nj1973 commented 10 months ago

We need to spend some time understanding how a customer moves from a cloned github repo to a working installation.

We have already added a Makefile recipe to create a package containing many required artefacts.

make package

The tarball contains:

What we do not include is:

  1. An Oracle client which we no longer bundle. If we switch to python-oracledb then they probably don't need a client, for now I think it has to be a documented prereq.
  2. A Python virtual environment with required packages

It is item 2 above that this issue concerns. We need to think about how we should ensure the customer has the correct Python and packages.

nj1973 commented 9 months ago

Notes from team chat

Goals for this issue are:

Notes:

nj1973 commented 9 months ago

Good reading:

nj1973 commented 9 months ago

More notes.

Good package structure?

bin/         # Replaces scripts
src/goe/     # Replaces gluentlib/gluentlib
tests/
docs/
pyproject.toml
LICENCE.txt
README.txt

gluent.py should not be in bin, it should go somewhere else inside src/goe, away from the entry level scripts.

I think transport and spark-listener should go into a tools subdirectory.

nj1973 commented 9 months ago

Re-opening because the PR was part 1 of 3 changes:

  1. Switch setup.py to pyproject.toml
  2. Restructure the repo in a more standard way (not yet)
  3. Bundle a Python executable of some kind with the final package (not yet)

1 down, 2 to go

nj1973 commented 9 months ago

Memo

I tried to install the goe wheel on Debian 5.10 with Python 3.9 and failed with error:

Complete output (22 lines):
  /bin/sh: 1: krb5-config: not found

And also:

  gssapi/raw/misc.c:51:10: fatal error: Python.h: No such file or directory

Solutions:

sudo apt-get -y install libkrb5-dev gcc
sudo apt-get -y install python-dev python3-dev
nj1973 commented 9 months ago

Parts 1 & 2 of the plan complete:

  1. Switch setup.py to pyproject.toml
  2. Restructure the repo in a more standard way (not yet)
  3. Bundle a Python executable of some kind with the final package (not yet)

Part 3 needs some thought. Currently PyInstaller is my primary plan.

nj1973 commented 9 months ago

Capturing some thoughts I had recently:

e.g.:

[project.optional-dependencies]
oracle = [
    "cx-Oracle==7.3.0",
]
gcp = [
    "google-cloud-bigquery==3.4.2",
    "google-cloud-kms==2.14.1",
]
hadoop = [
    "hdfs==2.6.0",
    "impyla==0.17.0",
    "thrift-sasl==0.4.3",
]
... etc ...
nj1973 commented 8 months ago

Another option for packaging Python:

" Hatch dropped a new release yesterday that may make things so much easier. It's got built in support for bundling python distros and installing them. " https://github.com/pypa/hatch/releases/tag/hatch-v1.8.0

nj1973 commented 8 months ago

I've spun the remaining tasks from this issue into two new ones and am closing this one.