Closed eboileau closed 2 years ago
The tests pass locally using both installations
With /prj/rpbp-dev/working-envs/install-conda
: 2 passed in 1014.86s (0:16:54)
, and
with install-dev-std
: 2 passed in 912.87s (0:15:12)
.
so there is something different happening when the tests run on GitHub Actions...
I tried running pytest locally with the conda environment and got the same error as the github action.
I guess @pytest.mark.depends
is from https://pypi.org/project/pytest-depends/?
If so it would need to be in the environment file: https://github.com/dieterich-lab/rp-bp/pull/118 (github isn't running the CI on the PR until you approve it)
I installed pytest-depends locally but now when I run pytest I get a different error, I think due to test_pipeline_part2
being ran first.
Am now trying again locally having added @pytest.mark.depends(on=['test_pipeline_part1'])
to test_pipeline_part2
to ensure it runs after test_pipeline_part1
...
I don't understand why pytest test_rpbp.py
completed successfully locally for me.
The problem doesn't seem resolved, what shall I do with this PR https://github.com/dieterich-lab/rp-bp/pull/118?
Meanwhile, I will fix references to pybio-utils
in the code.
It is strange - maybe somehow we have a different version of some dependency?
Feel free to ignore the PR for now, I'll push changes there & hopefully at some point the tests will pass :-)
Could you maybe paste the output from conda info && conda list
here to see if I have something different?
Here is my output:
active environment : rpbp
active env location : /home/liam/miniconda3/envs/rpbp
shell level : 1
user config file : /home/liam/.condarc
populated config files : /home/liam/.condarc
conda version : 4.13.0
conda-build version : not installed
python version : 3.9.5.final.0
virtual packages : __linux=5.15.0=0
__glibc=2.35=0
__unix=0=0
__archspec=1=x86_64
base environment : /home/liam/miniconda3 (writable)
conda av data dir : /home/liam/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/liam/miniconda3/pkgs
/home/liam/.conda/pkgs
envs directories : /home/liam/miniconda3/envs
/home/liam/.conda/envs
platform : linux-64
user-agent : conda/4.13.0 requests/2.28.1 CPython/3.9.5 Linux/5.15.0-43-generic ubuntu/22.04.1 glibc/2.35
UID:GID : 1001:1001
netrc file : None
offline mode : False
# packages in environment at /home/liam/miniconda3/envs/rpbp:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
appdirs 1.4.4 pypi_0 pypi
attrs 22.1.0 pypi_0 pypi
biopython 1.73 pypi_0 pypi
biothings-client 0.2.6 pypi_0 pypi
bowtie2 2.3.0 py36_1 bioconda
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2021.5.30 py36h06a4308_0
charset-normalizer 2.0.12 pypi_0 pypi
colorama 0.4.5 pypi_0 pypi
coverage 6.2 pypi_0 pypi
curl 7.61.0 h84994c4_0
cycler 0.11.0 pypi_0 pypi
cython 0.29.32 pypi_0 pypi
dask 2021.3.0 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
fastparquet 0.4.1 pypi_0 pypi
flexbar 3.5.0 hf53871c_5 bioconda
future-fstrings 1.2.0 pypi_0 pypi
idna 3.3 pypi_0 pypi
importlib-metadata 4.8.3 pypi_0 pypi
importlib-resources 5.4.0 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
joblib 0.13.2 pypi_0 pypi
kiwisolver 1.3.1 pypi_0 pypi
libcurl 7.61.0 h1ad7b7a_0
libedit 3.1.20210910 h7f8727e_0
libffi 3.2.1 hf484d3e_1007
libgcc 7.2.0 h69d50b8_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libssh2 1.8.0 h9cfc8f7_4
libstdcxx-ng 11.2.0 h1234567_1
libzlib 1.2.11 h166bdaf_1014 conda-forge
llvmlite 0.36.0 pypi_0 pypi
matplotlib 3.3.4 pypi_0 pypi
matplotlib-venn 0.11.7 pypi_0 pypi
more-itertools 8.14.0 pypi_0 pypi
mygene 3.2.2 pypi_0 pypi
ncurses 6.3 h5eee18b_3
networkx 2.5.1 pypi_0 pypi
numba 0.53.1 pypi_0 pypi
numpy 1.16.6 pypi_0 pypi
openpyxl 3.0.10 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
packaging 21.3 pypi_0 pypi
pandas 0.24.0 pypi_0 pypi
patsy 0.5.2 pypi_0 pypi
pbio 1.0.0 pypi_0 pypi
perl 5.26.2 h14c3975_0
perl-threaded 5.32.1 hdfd78af_1 bioconda
pillow 8.4.0 pypi_0 pypi
pip 21.2.2 py36h06a4308_0
pluggy 1.0.0 pypi_0 pypi
py 1.11.0 pypi_0 pypi
pyparsing 3.0.7 pypi_0 pypi
pysam 0.15.2 pypi_0 pypi
pystan 2.19.0.0 pypi_0 pypi
pytest 7.0.1 pypi_0 pypi
pytest-cov 3.0.0 pypi_0 pypi
pytest-depends 1.0.1 pypi_0 pypi
python 3.6.6 h6e4f718_2
python-dateutil 2.8.2 pypi_0 pypi
pytz 2022.1 pypi_0 pypi
pyyaml 5.4 pypi_0 pypi
readline 7.0 h7b6447c_5
requests 2.27.1 pypi_0 pypi
rpbp 2.0.0 pypi_0 pypi
samtools 1.7 1 bioconda
scikit-learn 0.24.2 pypi_0 pypi
scipy 1.2.1 pypi_0 pypi
seaborn 0.11.2 pypi_0 pypi
seqan-library 2.4.0 0 conda-forge
setuptools 59.6.0 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sklearn 0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
star 2.7.10a h9ee0642_0 bioconda
statsmodels 0.9.0 pypi_0 pypi
tbb 2020.3 hfd86e86_0
threadpoolctl 3.1.0 pypi_0 pypi
thrift 0.16.0 pypi_0 pypi
tk 8.6.11 h1ccaba5_0
tomli 1.2.3 pypi_0 pypi
tqdm 4.64.0 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
urllib3 1.26.11 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7f8727e_1
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h166bdaf_1014 conda-forge
I updated the pbiotools
references, but now the conda environment creation hangs forever... it seems related to pip dependencies, in particular git+https://github.com/dieterich-lab/pbiotools.git@dev-ssciwr#egg=pbiotools
... something got mixed up...
Output of conda info && conda list
. First of... looks like I have a different conda version...
active environment : /prj/rpbp-dev/working-envs/install-conda
active env location : /prj/rpbp-dev/working-envs/install-conda
shell level : 1
user config file : /home/eboileau/.condarc
populated config files : /home/eboileau/.condarc
conda version : 4.9.2
conda-build version : not installed
python version : 3.8.5.final.0
virtual packages : __glibc=2.28=0
__unix=0=0
__archspec=1=x86_64
base environment : /home/eboileau/.miniconda3 (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/eboileau/.miniconda3/pkgs
/home/eboileau/.conda/pkgs
envs directories : /home/eboileau/.miniconda3/envs
/home/eboileau/.conda/envs
platform : linux-64
user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.5 Linux/4.19.0-21-amd64 debian/10 glibc/2.28
UID:GID : 10018:10001
netrc file : None
offline mode : False
# packages in environment at /prj/rpbp-dev/working-envs/install-conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
appdirs 1.4.4 pypi_0 pypi
attrs 22.1.0 pypi_0 pypi
biopython 1.73 pypi_0 pypi
biothings-client 0.2.6 pypi_0 pypi
bowtie2 2.3.0 py36_1 bioconda
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2021.5.30 py36h06a4308_0
cfgv 3.3.1 pypi_0 pypi
charset-normalizer 2.0.12 pypi_0 pypi
coverage 6.2 pypi_0 pypi
curl 7.61.0 h84994c4_0
cycler 0.11.0 pypi_0 pypi
cython 0.29.30 pypi_0 pypi
dask 2021.3.0 pypi_0 pypi
distlib 0.3.5 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
fastparquet 0.4.1 pypi_0 pypi
filelock 3.4.1 pypi_0 pypi
flexbar 3.5.0 hf53871c_5 bioconda
identify 2.4.4 pypi_0 pypi
idna 3.3 pypi_0 pypi
importlib-metadata 4.8.3 pypi_0 pypi
importlib-resources 5.2.3 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
joblib 0.13.2 pypi_0 pypi
kiwisolver 1.3.1 pypi_0 pypi
libcurl 7.61.0 h1ad7b7a_0
libedit 3.1.20210910 h7f8727e_0
libffi 3.2.1 hf484d3e_1007
libgcc 7.2.0 h69d50b8_2
libgcc-ng 11.2.0 h1234567_1 [26/1646]
libgomp 11.2.0 h1234567_1
libssh2 1.8.0 h9cfc8f7_4
libstdcxx-ng 11.2.0 h1234567_1
libzlib 1.2.11 h166bdaf_1014 conda-forge
llvmlite 0.36.0 pypi_0 pypi
matplotlib 3.3.4 pypi_0 pypi
matplotlib-venn 0.11.7 pypi_0 pypi
more-itertools 8.13.0 pypi_0 pypi
mygene 3.2.2 pypi_0 pypi
ncurses 6.3 h5eee18b_3
nodeenv 1.6.0 pypi_0 pypi
numba 0.53.1 pypi_0 pypi
numpy 1.16.6 pypi_0 pypi
openpyxl 3.0.10 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
packaging 21.3 pypi_0 pypi
pandas 0.24.0 pypi_0 pypi
patsy 0.5.2 pypi_0 pypi
pbio 1.0.0 pypi_0 pypi
perl 5.26.2 h14c3975_0
perl-threaded 5.32.1 hdfd78af_1 bioconda
pillow 8.4.0 pypi_0 pypi
pip 21.2.2 py36h06a4308_0
platformdirs 2.4.0 pypi_0 pypi
pluggy 1.0.0 pypi_0 pypi
pre-commit 2.17.0 pypi_0 pypi
py 1.11.0 pypi_0 pypi
pyparsing 3.0.7 pypi_0 pypi
pysam 0.15.2 pypi_0 pypi
pystan 2.19.0.0 pypi_0 pypi
pytest 7.0.1 pypi_0 pypi
pytest-cov 3.0.0 pypi_0 pypi
python 3.6.6 h6e4f718_2
python-dateutil 2.8.2 pypi_0 pypi
pytz 2022.1 pypi_0 pypi
pyyaml 5.4 pypi_0 pypi
readline 7.0 h7b6447c_5
requests 2.27.1 pypi_0 pypi
rpbp 2.0.0 pypi_0 pypi
samtools 1.7 1 bioconda
scikit-learn 0.24.2 pypi_0 pypi
scipy 1.2.1 pypi_0 pypi
seaborn 0.11.2 pypi_0 pypi
seqan-library 2.4.0 0 conda-forge
setuptools 59.6.0 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sklearn 0.0 pypi_0 pypi
sqlite 3.33.0 h62c20be_0
star 2.7.10a h9ee0642_0 bioconda
statsmodels 0.9.0 pypi_0 pypi
tbb 2020.3 hfd86e86_0
threadpoolctl 3.1.0 pypi_0 pypi
thrift 0.16.0 pypi_0 pypi
tk 8.6.11 h1ccaba5_0
toml 0.10.2 pypi_0 pypi
tomli 1.2.3 pypi_0 pypi
tqdm 4.64.0 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
urllib3 1.26.11 pypi_0 pypi
virtualenv 20.16.2 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7f8727e_1
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h166bdaf_1014 conda-forge
I think I broke the conda environment step in the rp-bp GHA by adding missing deps to setup.cfg in pbiotools:
-python_requires = >=3.6,<3.7.0a0
+ pysam
+ pybedtools
+ Bio
+ pyensembl
+ pystan <3
+
+python_requires = >=3.6,<3.10.0a0
Some/most of these are already installed in the conda env, and maybe it can't find a consistent set of versions from pypi.
A short term fix could be to pip install pbiotools with --no-deps
Currently on gha we are mixing conda and pip dependencies in a not ideal way - better would be to have all dependencies from a single source, i.e. conda.
In this branch I install everything from conda (also using latest versions): https://github.com/lkeegan/rp-bp/commit/c50d13bc7c5084e339b3126790d4896f39aeef66
Yes, I just noticed python version differences pbiotools vs. rpbp.
Indeed, I agree mixing conda and pip is not ideal... conda install should ideally rely on conda packages, and eventually the standard pip install should rely on pip... but for some packages I don't know if we'll manage to have exactly matching versions in conda vs. pypi (i.e. potentially different install depending on whether it is done via conda or pip...). Let me know your thoughts...
As for my conda, I think I will try to conda update -n base -c defaults conda
, and hope it works... so I have the latest version to test locally.
Ok, the installation works with #118, and we're back to tests failing as before on file comparison (warning is gone, though).
I will try once more to reproduce this locally, and circumscribe the problem.
conda update
is running (I'm off tomorrow).
Your non-conda install of rp-bp also passes the tests, right?
If so, I'm wondering if your conda install of rp-bp is using some system dependency instead of the conda one (e.g. you didn't have pytest-depends
in your conda environment so I guess it was picking up a system installed version of it - maybe something similar is happening for another rp-bp dependency?)
It would possibly explain how your conda & non-conda installs are consistent, while my conda install is consistent with the GHA ones but they differ from yours.
As I'm running the install/tests on our cluster, there is in fact little system-wide packages...
It's difficult to see the actual content of the files on GHA, but it really seems that Rp-Bp cannot reproduce the reference data, so some changes must have occurred, if not in the code, in some dependency that would affect the calculations and/or writing of the files, so I need to reproduce this locally to investigate. I'll first re-install everything freshly and test again, and will also try on my laptop.
That makes sense - if helpful I can run the failing tests locally and send you the generated output files?
I also just noticed that even when installing into a fresh conda environment it looks like previsouly pickled pystan models are re-used instead of being re-compiled, as they are cached somewhere outside of conda:
WARNING : A model already exists at: /home/liam/.local/share/rpbp/rpbp_models/nonperiodic/no-periodicity.pkl. Skipping.
Maybe this is also the case on your cluster and your pystan models were previously compiled with a different compiler / pystan version?
With your recent PR #121, output is easier (and faster) to see on GHA! I will merge next.
Yes, the models are not recompiled by default, unless we force it, but this option has not been thoroughly tested.
pystan
version has not changed yet so far, but you're right, it might have to do with the C compiler version...? I could try to regenerate my models.
So copy&pasting the 100 displayed values from the GHA failing test output & comparing to my local test output they agree exactly:
c-elegans-rep-1-unique.metagene-periodicity-bayes-factors.csv.gz
Indeed, this points to the direction of the Stan models, and this would make sense since I generated the reference data with these, i.e. results are deterministic up to the parameters and models. So as I initially thought, we could not expect results (part 2) to match exactly in a general situation, and I was fooled by the fact that I did reproduce them exactly (overlooking the model compilation)! But as I said, when I'm back next week, I'll first try to reproduce the error locally (re-install rpbp and recompile models), to be sure, and then adjust accordingly. Thanks for your help.
Sounds good!
After multiple failed attempts, I remembered that we have one test node with Debian 11 on the cluster, so I installed the conda environment there, recompiled the models, and ran the example. I could finally reproduce exactly the GHA results (and yours @lkeegan ), which I was not able to reproduce on Debian 10, so it seems likely that the kernel/architecture and/or C compiler version affected the model output.
For selected outputs, in particular low confidence profiles e.g. no or little p-sites (profile_sum
or profile_peak
), the mean and variance can differ drastically, so it does not make sense to compare these even within some tolerance. However, we could possibly select the periodic lengths and offsets, and compare values.
As for metagene-profiles and orf-profiles, these should be directly comparable.
As for the ORF predictions, model outputs (mean, var) differ, but also the actual set of ORFs. The reason is that differences in Bayes factor mean and variance (get_bf_filter
) can lead to slightly different sets of ORFs, which are then potentially affected by get_longest_features_by_end
. In general, discrepancies in model outputs increase as confidence decreases, with large differences for variance.
One possible solution would be to completely ignore the model outputs, and check consistency of the results only on the intersection, and report (without raising errors) the ORFs that are found in one or the other, but not both results ( i.e. reference .vs current ), for completeness.
Another solution is to actually compare only the means with some reasonable tolerance e.g.
reference = predicted_orfs_reference[[c for c in predicted_orfs_reference.columns if 'var' not in c]]
current = predicted_orfs_current[[c for c in predicted_orfs_current.columns if 'var' not in c]]
pd.testing.assert_frame_equal(reference, current, check_exact=False, rtol=.5)
In all cases, discrepancies in some model outputs is worse than I initially thought, and we haven't even tested on macOS, etc. But we need to move on, so I'll work out something.
For the record, I'm musing the conda install only with
git clone https://github.com/dieterich-lab/rp-bp.git
cd rp-bp
git checkout dev-ssciwr
conda env create --prefix debian11-conda -f environment.yml
conda activate debian11-conda
pip install dask sklearn appdirs tqdm mygene openpyxl fastparquet more_itertools matplotlib matplotlib_venn seaborn
pip install git+https://github.com/dieterich-lab/pbiotools.git@dev-ssciwr --no-deps --verbose
pip install . --verbose
Description After this commit https://github.com/dieterich-lab/rp-bp/commit/8e9799a350c403244e61b5c1800f180caf3e1ed6, tests failed, while they succeeded locally. However, we're currently using both the new conda install and the old standard install (with module environment). Although they should be equivalent, it seems that results for the second part of the pipeline do not match with the reference dataset. I am currently testing locally, but this potentially means that regression tests could fail depending on the installation?!
Expected behavior CI workflow run is successful.
To Reproduce
python -m pytest . --cov=rpbp --cov-report=xml -s -v
Output This happens in
test_pipeline_part2
forIn addition, we have
Environment platform linux -- Python 3.6.6, pytest-7.0.1, pluggy-1.0.0