anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
216 stars 88 forks source link

Issue 337: Added pkg_key to ensure 'packages' == 'dependencies' #359

Open mforbes opened 2 years ago

mforbes commented 2 years ago

The general strategy here is to add an attribute pkg_key to YamlFile, which can then be set as an instance attribute when a file is loaded, specifying that the file contains 'packages' or 'dependencies'. Which one is determined by _get_pkg_key() which looks for one of these in the top level and in env_specs. Currently _get_pkg_key() always returns something: it could in principle check for inconsistencies. This code is in place with assert but these are ignored so as not to muckup tests.

The testing strategy is to introduce a pkg_key fixture passed to tests that had 'packages'. Some of these tests create files from scratch. To test these, I added a decorator @pytest._change_default_pkg_key which will run the test in a context where the default value of YamlFile.pkg_key is changed (then restored). This might skip some cases where a file is inconsistent with the default but should still work (but I added a minimal amount of these, so it is not so likely).

The current implementation breaks python 2.7 compatibility (I use f-strings in many places to keep the new code separate from older code) but this is easily changed with .replace(). (Should we add nox for testing against different versions of python? I suppose that the CI will get this, but it would be nice to be able to test it locally while developing).

The tests which run the redis service do not work yet, and I have not been able to run the slow tests.

AlbertDeFusco commented 2 years ago

On my Mac I've rebased your branch against #352 where I have fixed the tests, especially the slow ones.

So far only 11 tests fail 👍 🎉 . I'll push a branch you can try for yourself and maybe make a few fixes for these tests.

AlbertDeFusco commented 2 years ago

Here's my branch: https://github.com/AlbertDeFusco/anaconda-project/tree/adefuco/issue_337

AlbertDeFusco commented 2 years ago

Another note. There is an easier way to change the pkg_key and you can do it from the parametrized fixture or put the monkeypatch in its own fixture.

in confest.py

import pytest

@pytest.fixture(params=["packages", "dependencies"])
def pkg_key(request, monkeypatch):
    """Ensure equivalence between `dependencies` and `packages`"""
    monkeypatch.setattr('anaconda_project.yaml_file.YamlFile.pkg_key', request.param)
    return request.param

For this case there would likely not be any difference between yield and return. I suppose either one works the same way since there is no other code after the yield.

mforbes commented 2 years ago

Is there a reason to return rather than yield? I took the documentation to be recommending to always yield. (I don't see why there would be a functional difference.)

However, I think that this should be done in a separate fixture default_pkg_key for example, otherwise we will never test against using values of pkg_key that are contrary to the default. The only potential issue with a fixture is that linters might complain that the argument is not used in some tests?

AlbertDeFusco commented 2 years ago

Having them separate and using yield in pkg_key works for me. I would recommend using the oneline monkeypatch directly in the tests that need it rather than as a fixture or decorator. For example, this test would not change the default

def test_add_command_specifying_notebook(monkeypatch, capsys, pkg_key):
    def check_specifying_notebook(dirname):
        args = Args('notebook', 'test', 'file.ipynb', directory=dirname)
        res = main(args)
        assert res == 0

        project = Project(dirname)

        command = project.project_file.get_value(['commands', 'test'])
        assert command['notebook'] == 'file.ipynb'
        assert command['env_spec'] == 'default'
        assert len(command.keys()) == 2

    with_directory_contents_completing_project_file({DEFAULT_PROJECT_FILENAME: f'{pkg_key}:\n - notebook\n'},
                                                    check_specifying_notebook)

And a test that does the monkeypatch would have this extra line

def test_add_command_generates_env_spec_suggestion(pkg_key):
    monkeypatch.setattr('anaconda_project.yaml_file.YamlFile.pkg_key', pkg_key)

    def check_add_command(dirname):
        project = project_no_dedicated_env(dirname)
        assert project.problems == []
...
mforbes commented 2 years ago

I think it might be better to keep both fixtures so it is clear which tests are testing different default values and which are testing different explicit values. I will try to take a stab at this today. There are some cases where I need input:

What should happen if a user uses both packages and dependences in the same file? Is this allowed, or should this be treated as an error? (I think the current code will try to preserve this, but need to test this more clearly.)

Another example istest_project.py::test_no_auto_fix_env_spec_with_notebook_bokeh_injection which uses get_value() and set_value(). Everything has to be matched carefully to work right now, but I could imagine that internally we might want to treat these as equal.

My preference would be to migrate everything towards dependencies for conda compatibility, and not make any special cases except backwards compatibility with files that have packages everywhere, but I have not seen a discussion about whether we should try to support mixed version.

AlbertDeFusco commented 2 years ago

I'd like to have dependencies as default. A user may have Yaml aliased dependencies to packages, so a stern warning might be useful.

mforbes commented 2 years ago

Okay, then I suggest reworking things to simply convert packages to dependencies on input/read, and then always output dependences. That will keep things simpler.