[BUG] tests fail with conda auto_stack

AlbertDeFusco commented 2 years ago

from @mforbes

BTW: I could not run all of the tests: Things stalled at "anaconda_project/internal/test/test_conda_api.py::test_resolve_dependencies_with_actual_conda_other_platforms" with this step taking close to an hour. (Mac OS X 10.14.6).

I also failed anaconda_project/test/test_prepare.py::test_default_to_system_environ because my default ~/.condarc used auto_stack: 1 and I start with a base environment active, which caused some path changing issues. This may relate to #336 where better control over global conda configuration may be needed for reproducibility.

Can we easily run tests with a repeatable configuration? In the meantime, I will add an additional exception.

AlbertDeFusco commented 2 years ago

I'll give this a try on my Mac as well. My typical setup is as follows. I'm going to update the CONTRIBUTING.md file

Install Miniconda or Anaconda
clone this repository and cd to the cloned repository directory
Create a dev environment conda env create -f environment.yml. This creates anaconda-project-dev that includes all necessary dependencies, testing, and linting tools

Activate the env and install anaconda-project as editable

> conda activate anaconda-project-dev
> pip install --no-deps -e .

Now I can run tests using pytest. The setup.cfg file configures the correct arguments
```
> pytest -x # -x means crash on first test failure
```

mforbes commented 2 years ago

Related to these things, maybe we should provide an anaconda-project.yaml file instead on the environment.yml file:-)

AlbertDeFusco commented 2 years ago

:) I've been thinking about that, too. Maybe I'll study compiler bootstraping to see if we could do something similar. I've seen other projects use Makefiles for things that anaconda-project does well.

AlbertDeFusco commented 2 years ago

@mforbes , so I'm looking at this some more and this test seems to remove the entry to the base env PATH (not the condabin) and so fails.

The question is what behavior do you feel is best?

should anaconda-project (during prepare and run) retain the path to the base (stacked) env bin directory or not when stacking is enabled?

mforbes commented 2 years ago

I think that anaconda-project should, by default, behave like conda and respect the stacking - so the base environment would be kept on the path. However, I think it is also very important that anaconda-project have a way of ignoring any system or user preferences to ensure reproducibility (similar to issue #336).

I am not sure of the best way to do this, but options might include:

A simple flag in anaconda-project.toml that allows the user to have AP ignore any system or user configuration. Perhaps this could be something like condarc= which, if present, disables these. This flag could also be allowed to point to a local file condarc=env1.condarc etc. where local overrides could be specified (but see[^1]). We might also allow inline specification of the .condarc files as a multi-line string:
```
condarc="""
channel_priority: strict
channels:
 - defaults
auto_stack: 1
...
```
Support in anaconda-project.toml for specific overrides like auto_stack and override-channels. This might fall prey to[^1], but means that we must maintain an ever-growing (changing?) set of options in anaconda-project. If there is no way of somehow just passing a complete config to conda, however, then this would make it clear to users which features/overrides anaconda-project supports.

[^1]: It seems that conda's current design is to allow administrators to lock the configuration, and there are complaints that users can override this (https://github.com/conda/conda/issues/10821). Some of the mechanisms I describe here would allow the same thing, so it might be necessary to only enable full customization in conjunction with a bootstrapping phase where a user-privledged installation of miniconda is used rather than a system-wide version. (The conda docs are conflicting about the precedence of the various config files, and it is not clear what the outcome will be, but if the admin-installed config files are supposed to win, then we will have a problem.)

jbednar commented 2 years ago

My vote is for an anaconda-project.yml to be fully independent of any configuration or settings the user may have configured for conda more generally, apart from a very small number of exceptions that are about configuring the server and tokens that might be necessary for conda to operate with a local mirror, behind a firewall, etc. Apart from those exceptions, I believe the file should include any options that are needed, or else it won't be reproducible. Given that this is a breaking change, it seems like something to do when renaming to conda project.

AlbertDeFusco commented 2 years ago

I like where this is going and I think in the context of the proposed conda project having very clearly defined boundaries between the base environment (or any env where [ana]conda-project is installed) will be beneficial. I've studied the Conda config stacking problem before and I seem to remember that allowing users to override system settings is considered correct behavior at this time, but perhaps could get revisited.

I do believe there is room for anaconda-project to have control over channel_priority for itself that can operate independent of the condarc config. I would say this key can be set in the anaconda-project.yml file globally and for any env_spec (if set in an env_spec it would override the global config if preset).

When it comes to auto_stack there is some overlap in my research on how anaconda-project manages PATH. Anaconda Project was written at a time before conda run and conda activate and so it must manipulate the path itself. You can see this at the following link. Since anaconda-project does not use conda run or conda activate some env vars defined by installed packages will not get applied. This happens with gdal and proj as pointed out in #349. Perhaps by adopting conda run we can avoid the PATH manipulation (and maybe even avoid having separate unix and windows command types unless absolutely necessary)

during prepare (and run) the base Conda env prefix path is forcibly removed: https://github.com/Anaconda-Platform/anaconda-project/blob/3cd280d462197b29f831f352f2c2ee65a16330d4/anaconda_project/prepare.py#L696-L698

If we were to add an auto_stack configuration option I'm concerned that now this project would be explicitly calling out a dependency on something outside of its control (i.e., a package installed in the base env) and would not be reproducible. I know I've used system tools in my run commands (like grep) without adding it as a dependency and maybe that's a bad habit.

I'm not sure that it is necessary to allow re-configuration of the global channels since PR #352 now includes --override-channels for dependency solving, env creation, and adding packages, which completely ignores the global channels definition and relies entirely on the anaconda-project.yml file (if channels is not present in the project file it assumes channels: [defaults]). Users can still configure their default_channels at the condarc level and so far it feels appropriate that this is handled outside anaconda-project since the main reason to change default_channels is to configure Conda to install from private Conda servers (local mirror).

To come back, even though some tests fail with auto_stack: 1 do your projects work correctly when it is enabled? Can you explain a bit more about your use case with this setting to motivate changing the current behavior?

mforbes commented 2 years ago

I have not noticed any issues when using anaconda-project. My basic use-case is to have a bunch of tools like Mercurial, Conda, Poetry, etc. installed in my base environment, then activating other environments on top for python isolation. I still like to be able to use my version control when I am doing work without having to install mercurial in the project environment for example.

Update: I did not check this very carefully – anaconda-project does not respect the auto_stack=1 setting in my ~/.condarc file, thus, packages like Mercurial which I have installed in my base environment are not accessible when I use anaconda-project run. I did not notice because I use this as follows:

# anaconda-project.yaml
...
commands:
  shell:
    unix: bash --init-file .init-file.bash
    env_spec: phys-521-2021

# .init-file.bash
export PS1="\h:\W \u\$ "
source $(conda info --base)/etc/profile.d/conda.sh

# Assume that this is set by running anaconda-project run shell
CONDA_ENV="${CONDA_PREFIX}"
conda deactivate
conda activate base
conda activate "${CONDA_ENV}"
alias ap="anaconda-project"
alias apr="anaconda-project run"

I was doing this so that the path would properly show that the phys-521-2021 was active, but a side-effect of actually using conda is that it respects my ~/.condarc:-). Ultimately, it might be nice to have an anaconda-project shell command (mirroring poetry shell) that does everything like this, but for now, this workarround is pretty reasonable, at least for my purposes. Let me know and I can open a feature request to flesh out a shell command if we think that would be useful.

anaconda / anaconda-project

[BUG] tests fail with conda auto_stack #357