anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
217 stars 88 forks source link

[ENH] pip check #332

Open AlbertDeFusco opened 3 years ago

AlbertDeFusco commented 3 years ago

There is growing suppport for running pip check as a test in conda-forge recipes. Currently this will fail due the handling of yaml packages.

mforbes commented 2 years ago

How exactly is pip check used? I am not sure what the failure mode is, or how it might be made to work. I am looking for an anaconda-project check command (see below).

There are a bunch of pip-related issues that will probably require some refactoring to address. Do you want me to submit these as individual tickets or collect them in one? Some issues I have noticed:

  1. pip should be included as a dependency if pip packages are used. (Conda warns about this when using the file as an environment.yml file, and this may break in the future.
  2. anaconda-project update does not update pip packages, even if one updates the version in anaconda-project.yml file.
  3. Issue #214 is even worse with pip packages. In lieu of a better solution, requiring anaconda-project prepare --refresh is probably okay if anaconda-project had a check command that would check that the installed packages are compatible with the requirements, and warned the user if this was not the case. (Please let me know if I have missed something and this command does exist in some form.)
  4. Maybe we could somehow use something like poetry to manage the pip dependencies? Probably a bad idea....
mforbes commented 2 years ago

Coming back to point 4: is there any reason that one can't have . as a pip dependency? It does not work right now:

anaconda-project.yaml: invalid pip package specifier: .

but it seems like it might be a reasonable idea to have anaconda-project create a conda environment with things like cupy, pyfftw etc. that need binaries installed, then use poetry to manage all of the pure python dependencies. I have not had a chance to look into this in detail yet, but is there any reason this would not work? Is this explicitly disabled, or just something that does not work because of parsing issues. (One can try to trick this with VCS dependencies, but this is a hack.)

I hope to have more time to dive in and start trying to resolve some of these issues in the new year, but would like to collect information in the meantime.

AlbertDeFusco commented 2 years ago

Would the requirements.txt scenario in PR #275 be close to what you're interested in? Perhaps have the anaconda-project.yml file define the python version to create the env and all other. Is pyproject.toml the correct filename for poetry dependencies? Since poetry has its own lock definition it seems reasonable to support it in anaconda-project like we do pip.

I'm not sure what . means as a pip dependency. Is that a special hook to use poetry?

mforbes commented 2 years ago

The . dependency is just a reference to the current repository. I.e. pip install --use-feature=in-tree-build ..

Conda can use this for example (but anaconda-project fails):

Example configuration files ```yaml # anaconda-project.yaml name: test_env channels: - defaults dependencies: - python=3.9 - conda-forge::pyfftw - pip - pip: - .[fftw] ``` ```toml # pyproject.toml [tool.poetry] name = "test_env" version = "0.0.1" description = "Test for anaconda-project + poetry." authors = ["Michael McNeil Forbes "] license = "MIT" [tool.poetry.dependencies] # Server dependencies python = "^3.8|^3.9" # Optional performance dependences: these need underlying libraries pyFFTW = {version = "^0.12.0", optional = true} # Tests pytest = {version = "^6.2.5", optional = true} [tool.poetry.extras] fftw = [ "pyfftw", ] tests = [ "pytest", ] [tool.poetry.dev-dependencies] memory-profiler = "^0.58.0" line-profiler = "^3.2.6" black = "^21.6b0" [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api" ``` ```bash mkdir test_env; touch test_env/__init__.py # A stub for the actual "package" conda env create -f anaconda-project.yaml conda activate test_env python -c "import pyfftw;print(pyfftw.__version__)" ``` `anaconda-project` fails: ```bash $ anaconda-project prepare anaconda-project.yaml: invalid pip package specifier: .[fftw] Unable to load the project. ```

Nothing special for poetry here there. This will look in pyproject.toml and install the current package as if it were hosted somewhere else and you specified the actual name. Poetry does use pyproject.toml for its dependences. I am learning that there are some subtleties, i.e. poetry allows logical or between dependencies, which fails with pip, etc. so one must be a little more careful than I would like.

I don't think PR #275 helps - I have no problem using anaconda-project.yaml as long as it can also be used by conda.

My main goal is to allow something like Poetry, which is very persnickety about pure python dependencies (leading to more robust packages), to manage these. From a user perspective, it would be great if Anaconda-Project could do all this, but I think that would probably make it much more complex and difficult to maintain. Thus the idea is to be able to use the work going into projects like Poetry to deal with subtle details of pure python dependencies, but enable additional managing of conda stuff where non-python libraries could be used.

I still need to dive into the code to see how it currently works with pip, but it would probably best if there were some way to specify which tool will be used. (Poetry has some great features, but I still feel like there is a lot of uncertainty about if is really the "one way" forward, so from a maintenance perspective, I would probably not want to build much explicit poetry support into anaconda-project.) However, there is currently one issue: Anaconda Project is often used for development, but the pip install approach described in the details above will not install the development tools (like memory-profiler in the example above). For this, we might need to support something like poetry install rather than just pip install .

AlbertDeFusco commented 2 years ago

I see. It seams reasonable to support . as a package. That would work for projects that have a setup.py or pyproject.toml along with anaconda-project.yml, right?

You'll find a file called pip_api.py that can be a model for poetry. Maybe we need a flag to enable it or autodetect if the poetry conda package has been added and if the pyproject.toml file is present.

AlbertDeFusco commented 2 years ago

I've been catching up with conda-lock today and I see that a good portion of what you need is there. It appears as though any packages listed in pyrpoject.toml would attempt to be installed with conda, but I may be reading that part wrong.

https://github.com/conda-incubator/conda-lock#pyprojecttoml