removes tox and runs ci tests with pytest directly

djkapner commented 3 years ago

The purpose of this PR is to make the circleci builds more transparent, by removing tox, so that local tests can be run, using just pytest, to reproduce/validate the circleci tests.

@dougo this does not solve any test failures, nor the dependency conflicts between this repo and AllenSDK, but, it does remove tox so that the connection between local and cloud tests are more clear. To make them even more parallel, I would suggest replacing pytest.mark.skipif statements with registered pytest marks, which you can then turn on/off on the command line and in the circleci config file. Then, the tests don't mysteriously start running in certain cases.

In requirements.txt AllenSDK has been pinned to 2.7.0. This pinning would keep a stable environment for users, regardless of whether a new AllenSDK has been released. There are 3 circleci builds:

python 3.6, pinned AllenSDK

This build is currently failing on circleci (3 tests failing), which can be reproduced locally by:

conda create -n test_py36 python=3.6
conda activate test_py36
pip install -r requirements.txt
pip install -r requirements_dev.txt
pip install .
python -m pytest

python 3.7, pinned AllenSDK

Same as above, except different python version. Circleci has 7 tests failing, local tests show 12 failing. (I believe this is because the skipped tests here on circleci: https://github.com/AllenInstitute/visual_behavior_analysis/blob/master/tests/validation/test_regression.py)

conda create -n test_py37 python=3.7
conda activate test_py37
pip install -r requirements.txt
pip install -r requirements_dev.txt
pip install .
python -m pytest

python 3.7, git AllenSDK

There was some confusion why these tests are giving different failures. The reason is that AllenSDK and this repo have conflicting requirements on some packages. If you install this repo via requirements.txt (pinned to v2.7.0 here) and then uninstall allensdk, and then reinstall allensdk viai git+https, then a number of versions specified by this repo are derpecated:

Successfully built allensdk
Installing collected packages: numpy, h5py, xarray, mock, tables, allensdk
  Attempting uninstall: numpy
    Found existing installation: numpy 1.20.1
    Uninstalling numpy-1.20.1:
      Successfully uninstalled numpy-1.20.1
  Attempting uninstall: h5py
    Found existing installation: h5py 3.1.0
    Uninstalling h5py-3.1.0:
      Successfully uninstalled h5py-3.1.0
  Attempting uninstall: xarray
    Found existing installation: xarray 0.16.2
    Uninstalling xarray-0.16.2:
      Successfully uninstalled xarray-0.16.2
  Attempting uninstall: tables
    Found existing installation: tables 3.6.1
    Uninstalling tables-3.6.1:
      Successfully uninstalled tables-3.6.1
Successfully installed allensdk-2.7.0 h5py-2.10.0 mock-4.0.3 numpy-1.18.5 tables-3.5.1 xarray-0.15.1

I suspect some of these can explain why some tests fail in test-37-pinned-allensdk and don't in test_allen_sdk_head

djkapner commented 3 years ago

I renamed some of the circleci builds to be more explicit. This is causing some phantom builds to appear in the list of checks. We can fix this manually, if you choose to adopt these changes, or, they will disappear in a week if just left.

dougollerenshaw commented 3 years ago

Thanks @djkapner ! Are these the two 'phantom builds'?:

dougollerenshaw commented 3 years ago

Also, quite a few test failures that weren't occuring with tox. Is that because they are not being skipped on circleci now?

djkapner commented 3 years ago

Thanks @djkapner ! Are these the two 'phantom builds'?:

yes, they'll just say "expected" until we take care of them

djkapner commented 3 years ago

Also, quite a few test failures that weren't occuring with tox. Is that because they are not being skipped on circleci now?

this is probably because your ci tests have not yet tested master + AllenSDK 2.7.0 which is essentially what this is. I'll make a tox version that does that, then you can compare the failures tox vs. non-tox.

Actually, I was wrong. There was something wrapped up in the tox that was skipping tests. Attempting to fix.

djkapner commented 3 years ago

ok, comparing this build (no tox) with #709 (with tox)

test-36-pinned-allensdk (no tox): ===== 194 passed, 9 ****ped, 6 deselected, 36 warnings in 65.67s (0:01:05) ===== test-36 (with tox): =========== 194 passed, 15 ****ped, 30 warnings in 63.99s (0:01:03) ============

etc. the python 3.7 build still has some failures, wheras tox did not. There is some sort of pandas/numpy problem. Tox was building on 3.7.3, which is not what AllenSDK was building on. I'd suggest taking a look at what tests are still failing and seeing if it is a test problem or a code problem, and whether it is easy to fx.

AllenInstitute / visual_behavior_analysis