bsmith89 / StrainFacts

Factorize metagenotypes to infer strains and their abundances
MIT License
11 stars 1 forks source link

bioconda recipe or pypi release? #9

Closed nick-youngblut closed 1 year ago

nick-youngblut commented 1 year ago

The recommended install method relies on conda, but strainfacts is installed via a pip install from the repo:

name: sfacts-dev
channels:
  - defaults
dependencies:
  - xarray
  - numpy
  - tqdm
  - pandas
  - scipy
  - scikit-learn
  - scikit-bio
  - seaborn
  - jupyter
  - jupyterlab
  - netcdf4
  - pip
  - pip:
    - pyro-ppl
    - -e ..

Integrating strainfacts into a large bioinformatics workflow makes strainfacts a dependency of that workflow, much like scikit-learn is a dependency of strainfacts. However, unlike scikit-learn, one cannot handle version management (and dependency conflicts) with conda for installing strainfacts, since strainfact is not available via bioconda (or even pypi). It would be very helpful to create a bioconda recipe for strainfacts, or at least publish a release of strainfacts on pypi so that someone else (e.g., me) can create the recipe.

Also, for the sake of reproducibility, it would help to have explicit versions for each dependency in the conda env yaml shown above. For instance, which version(s) of scikit-learn have been tested with strainfacts? It appears that there are no CI tests; let me know if you want help setting up a github action for continuous testing of at least the install of strainfacts.

bsmith89 commented 1 year ago

Thanks for your very reasonable feature request. I'm glad to hear that you are integrating StrainFacts into your workflow. I'd be happy to put out a pypi release, although it may take me some time to figure out the process and settle on a minimal set of dependencies.

Regarding integration testing, you're correct that there's no automated CI workflow, although the tutorial commands have been written into the example Makefile, and constitute a simple workout of the core functionality. I would absolutely welcome a pull request setting up GH actions for installing SF and running those.

I'll follow-up when there's a pypi release.

nick-youngblut commented 1 year ago

Great! You can setup automated publishing to pypi every time you create a new github tag. An example github action yaml: https://github.com/leylabmpi/resmico/blob/master/.github/workflows/python-publish.yaml

bsmith89 commented 1 year ago

Good evening @nick-youngblut

I just pushed a new release to PyPI. Testing seems to confirm that everything works alright. (sk-bio installation from PyPI is still a problem, but conda works fine.)

Because I'm still not fully understanding the CI workflow you implemented, it's not clear to me whether the testing environment (and dev container, for that matter) are smart enough to install from the github tip commit, or if they use the conda YAML spec directly. I therefore wasn't sure exactly how to install sfacts in that spec: github URL vs. PyPI vs. not at all.

Happy to take suggestions if you have any.

Thanks for your contributions!

nick-youngblut commented 1 year ago

I just pushed a new release to PyPI

That's great! Congrats 🎉

smart enough to install from the github tip commit, or if they use the conda YAML spec directly

Yes, they use the code that is in the commit. The point is to test the committed code. So, if you change the conda_env.yaml file, for instance, that will change what is installed in the CI environment via mamba:

    - uses: conda-incubator/setup-miniconda@v2
      with:
        miniconda-version: 'latest'
        auto-update-conda: true
        mamba-version: "*"
        python-version: ${{ matrix.python-version }}
        channels: conda-forge,bioconda
        environment-file: conda_env.yaml