MontrealCorpusTools / PolyglotDB

Language data store and linguistic query API
MIT License
36 stars 13 forks source link

Update packaging and pgdb script for improved compatibility #188

Open lxy2304 opened 2 weeks ago

lxy2304 commented 2 weeks ago

The current packaging and installation process for PolyglotDB has several issues on different operating systems (Mac, Windows, Linux). Additionally, the pgdb script uses outdated practices that could be improved for better environment compatibility.

Proposed Changes:

Modernize setup.py:
    Replace scripts=['bin/pgdb'] with entry_points for better cross-platform support.
    Ensure dependencies are clearly specified.

Update pgdb Script:
    Modify the script to store configuration and data within the conda environment if available.
    Fallback to user-specific directories only if no environment is detected.
    Improve configuration handling to avoid issues with hardcoded paths

Edited Jun 23*

setup.py

  1. pgdb executable script on Windows: Issue: The script is not generated correctly. Proposed Fix: Update setup.py to use entry_points = {'console_scripts': []} instead of scripts = [] for better cross-platform compatibility. This requires rewriting bin/pgdb as a function and relocating it to the polyglotdb directory.

  2. Specify scipy version: Requirement: conch-sounds uses scipy < 1.13. Recent changes in scipy~=1.13 caused ImportError: cannot import name 'gaussian' from 'scipy.signal'. Proposed Fix: Specify scipy version ~= 1.12.

  3. python setup.py install / setup.py install: Issue: This approach to install package from source code is outdated. Proposed Fix: May consider updating to a more modern practice with pyproject.toml and 'pip install .'

  4. setuptools.command.test: Issue: This test command provided by setuptools is deprecated. Proposed Fix: Remove the test command from setup.py

bin/pgdb

  1. Refactor for compatibility with entry_points in setup.py: Issue: The script needs to be compatible with entry_points. Proposed Fix: Refactor the script into a function.

  2. Apple Silicon and InfluxDB installation: Issue: Homebrew uses a different folder for InfluxDB on Apple Silicon. Proposed Fix: Add if-entry to check for Apple Silicon and modify folder accordingly. However, it should be noted that:

    • Consider alternatives to Homebrew since it is not installed by default on Macs (eg: download influxDB from https).

    • Homebrew installs the InfluxDB executable in a system-wide location, whereas it should be installed in the environment’s directory when using Conda/venv. That turns into the same issue as mentioned below.

  3. File storage locations in the pgdb script: Issue: Current file storage locations seem problematic during installation, need better resource management. Proposed Fix: Modify the script to identify the user's environment ('CONDA_PREFIX' in os.environ). For example:

    • The current version uses ~/.pgdb for default data storage if 'pgdb install' is run without a destination data directory. The script should automatically detect the environment (conda or local) and adjust the default data directory accordingly.
    • The config.ini location is currently static (~/.pgdb) even when a destination data directory is provided by the user, which prevents multiple environments from having separate configurations. (If updated, also change CONFIG_DIR in config.py in /polyglotdb to point to the new location.)
msonderegger commented 2 weeks ago

Thanks! Could you make this into multiple issues, for better modularity as these problems are addressed? So, at least two issues (one for setup.py things, one for bin/pgdb things), or 7 issues, whatever makes more sense to you.

On the pip install idea: I think this is actually already enabled: https://pypi.org/project/polyglotdb/

I remember Michael Haaf implemented this. I have no idea how it works, but could you see if the current setup there implements your idea, and if it's just the documentation that needs to be updated?

lxy2304 commented 2 weeks ago

Sure! I will add two separate issues for setup.py and bin/pgdb. As for pip install, I believe it's a different one that I mentioned here.