dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

Tests fail on GitHub Actions but succeed locally #117

Closed eboileau closed 1 year ago

eboileau commented 1 year ago

Description After this commit https://github.com/dieterich-lab/rp-bp/commit/8e9799a350c403244e61b5c1800f180caf3e1ed6, tests failed, while they succeeded locally. However, we're currently using both the new conda install and the old standard install (with module environment). Although they should be equivalent, it seems that results for the second part of the pipeline do not match with the reference dataset. I am currently testing locally, but this potentially means that regression tests could fail depending on the installation?!

Expected behavior CI workflow run is successful.

To Reproduce python -m pytest . --cov=rpbp --cov-report=xml -s -v

Output This happens in test_pipeline_part2 for

file = '/tmp/pytest-of-runner/pytest-0/data0/c-elegans-chrI-example/metagene-profiles/c-elegans-rep-1-unique.metagene-periodicity-bayes-factors.csv.gz'
ref_file = '/tmp/pytest-of-runner/pytest-0/data0/c-elegans-chrI-example/reference/metagene-profiles/c-elegans-rep-1-unique.metagene-periodicity-bayes-factors.csv.gz'

In addition, we have

=============================== warnings summary ===============================
tests/regression/conftest.py:207
  /home/runner/work/rp-bp/rp-bp/tests/regression/conftest.py:207: PytestUnknownMarkWarning: Unknown pytest.mark.depends - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.depends(on=['getf_genome'])

Environment platform linux -- Python 3.6.6, pytest-7.0.1, pluggy-1.0.0

eboileau commented 1 year ago

The tests pass locally using both installations

With /prj/rpbp-dev/working-envs/install-conda: 2 passed in 1014.86s (0:16:54), and with install-dev-std: 2 passed in 912.87s (0:15:12).

so there is something different happening when the tests run on GitHub Actions...

lkeegan commented 1 year ago

I tried running pytest locally with the conda environment and got the same error as the github action.

I guess @pytest.mark.depends is from https://pypi.org/project/pytest-depends/?

If so it would need to be in the environment file: https://github.com/dieterich-lab/rp-bp/pull/118 (github isn't running the CI on the PR until you approve it)

I installed pytest-depends locally but now when I run pytest I get a different error, I think due to test_pipeline_part2 being ran first.

Am now trying again locally having added @pytest.mark.depends(on=['test_pipeline_part1']) to test_pipeline_part2 to ensure it runs after test_pipeline_part1...

eboileau commented 1 year ago

I don't understand why pytest test_rpbp.py completed successfully locally for me. The problem doesn't seem resolved, what shall I do with this PR https://github.com/dieterich-lab/rp-bp/pull/118?

Meanwhile, I will fix references to pybio-utils in the code.

lkeegan commented 1 year ago

It is strange - maybe somehow we have a different version of some dependency?

Feel free to ignore the PR for now, I'll push changes there & hopefully at some point the tests will pass :-)

Could you maybe paste the output from conda info && conda list here to see if I have something different?

Here is my output:

     active environment : rpbp
    active env location : /home/liam/miniconda3/envs/rpbp
            shell level : 1
       user config file : /home/liam/.condarc
 populated config files : /home/liam/.condarc
          conda version : 4.13.0
    conda-build version : not installed
         python version : 3.9.5.final.0
       virtual packages : __linux=5.15.0=0
                          __glibc=2.35=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/liam/miniconda3  (writable)
      conda av data dir : /home/liam/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/liam/miniconda3/pkgs
                          /home/liam/.conda/pkgs
       envs directories : /home/liam/miniconda3/envs
                          /home/liam/.conda/envs
               platform : linux-64
             user-agent : conda/4.13.0 requests/2.28.1 CPython/3.9.5 Linux/5.15.0-43-generic ubuntu/22.04.1 glibc/2.35
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False

# packages in environment at /home/liam/miniconda3/envs/rpbp:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
appdirs                   1.4.4                    pypi_0    pypi
attrs                     22.1.0                   pypi_0    pypi
biopython                 1.73                     pypi_0    pypi
biothings-client          0.2.6                    pypi_0    pypi
bowtie2                   2.3.0                    py36_1    bioconda
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.07.19           h06a4308_0  
certifi                   2021.5.30        py36h06a4308_0  
charset-normalizer        2.0.12                   pypi_0    pypi
colorama                  0.4.5                    pypi_0    pypi
coverage                  6.2                      pypi_0    pypi
curl                      7.61.0               h84994c4_0  
cycler                    0.11.0                   pypi_0    pypi
cython                    0.29.32                  pypi_0    pypi
dask                      2021.3.0                 pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
et-xmlfile                1.1.0                    pypi_0    pypi
fastparquet               0.4.1                    pypi_0    pypi
flexbar                   3.5.0                hf53871c_5    bioconda
future-fstrings           1.2.0                    pypi_0    pypi
idna                      3.3                      pypi_0    pypi
importlib-metadata        4.8.3                    pypi_0    pypi
importlib-resources       5.4.0                    pypi_0    pypi
iniconfig                 1.1.1                    pypi_0    pypi
joblib                    0.13.2                   pypi_0    pypi
kiwisolver                1.3.1                    pypi_0    pypi
libcurl                   7.61.0               h1ad7b7a_0  
libedit                   3.1.20210910         h7f8727e_0  
libffi                    3.2.1             hf484d3e_1007  
libgcc                    7.2.0                h69d50b8_2  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libssh2                   1.8.0                h9cfc8f7_4  
libstdcxx-ng              11.2.0               h1234567_1  
libzlib                   1.2.11            h166bdaf_1014    conda-forge
llvmlite                  0.36.0                   pypi_0    pypi
matplotlib                3.3.4                    pypi_0    pypi
matplotlib-venn           0.11.7                   pypi_0    pypi
more-itertools            8.14.0                   pypi_0    pypi
mygene                    3.2.2                    pypi_0    pypi
ncurses                   6.3                  h5eee18b_3  
networkx                  2.5.1                    pypi_0    pypi
numba                     0.53.1                   pypi_0    pypi
numpy                     1.16.6                   pypi_0    pypi
openpyxl                  3.0.10                   pypi_0    pypi
openssl                   1.0.2u               h7b6447c_0  
packaging                 21.3                     pypi_0    pypi
pandas                    0.24.0                   pypi_0    pypi
patsy                     0.5.2                    pypi_0    pypi
pbio                      1.0.0                    pypi_0    pypi
perl                      5.26.2               h14c3975_0  
perl-threaded             5.32.1               hdfd78af_1    bioconda
pillow                    8.4.0                    pypi_0    pypi
pip                       21.2.2           py36h06a4308_0  
pluggy                    1.0.0                    pypi_0    pypi
py                        1.11.0                   pypi_0    pypi
pyparsing                 3.0.7                    pypi_0    pypi
pysam                     0.15.2                   pypi_0    pypi
pystan                    2.19.0.0                 pypi_0    pypi
pytest                    7.0.1                    pypi_0    pypi
pytest-cov                3.0.0                    pypi_0    pypi
pytest-depends            1.0.1                    pypi_0    pypi
python                    3.6.6                h6e4f718_2  
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2022.1                   pypi_0    pypi
pyyaml                    5.4                      pypi_0    pypi
readline                  7.0                  h7b6447c_5  
requests                  2.27.1                   pypi_0    pypi
rpbp                      2.0.0                    pypi_0    pypi
samtools                  1.7                           1    bioconda
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.2.1                    pypi_0    pypi
seaborn                   0.11.2                   pypi_0    pypi
seqan-library             2.4.0                         0    conda-forge
setuptools                59.6.0                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.33.0               h62c20be_0  
star                      2.7.10a              h9ee0642_0    bioconda
statsmodels               0.9.0                    pypi_0    pypi
tbb                       2020.3               hfd86e86_0  
threadpoolctl             3.1.0                    pypi_0    pypi
thrift                    0.16.0                   pypi_0    pypi
tk                        8.6.11               h1ccaba5_0  
tomli                     1.2.3                    pypi_0    pypi
tqdm                      4.64.0                   pypi_0    pypi
typing-extensions         4.1.1                    pypi_0    pypi
urllib3                   1.26.11                  pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h166bdaf_1014    conda-forge
eboileau commented 1 year ago

I updated the pbiotools references, but now the conda environment creation hangs forever... it seems related to pip dependencies, in particular git+https://github.com/dieterich-lab/pbiotools.git@dev-ssciwr#egg=pbiotools... something got mixed up...

Output of conda info && conda list. First of... looks like I have a different conda version...


     active environment : /prj/rpbp-dev/working-envs/install-conda
    active env location : /prj/rpbp-dev/working-envs/install-conda
            shell level : 1                                      
       user config file : /home/eboileau/.condarc                
 populated config files : /home/eboileau/.condarc                
          conda version : 4.9.2                                  
    conda-build version : not installed                          
         python version : 3.8.5.final.0                              
       virtual packages : __glibc=2.28=0                         
                          __unix=0=0                             
                          __archspec=1=x86_64                    
       base environment : /home/eboileau/.miniconda3  (writable) 
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/eboileau/.miniconda3/pkgs
                          /home/eboileau/.conda/pkgs
       envs directories : /home/eboileau/.miniconda3/envs
                          /home/eboileau/.conda/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.5 Linux/4.19.0-21-amd64 debian/10 glibc/2.28
                UID:GID : 10018:10001
             netrc file : None
           offline mode : False

# packages in environment at /prj/rpbp-dev/working-envs/install-conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
appdirs                   1.4.4                    pypi_0    pypi
attrs                     22.1.0                   pypi_0    pypi
biopython                 1.73                     pypi_0    pypi
biothings-client          0.2.6                    pypi_0    pypi
bowtie2                   2.3.0                    py36_1    bioconda
bzip2                     1.0.8                h7b6447c_0
ca-certificates           2022.07.19           h06a4308_0
certifi                   2021.5.30        py36h06a4308_0
cfgv                      3.3.1                    pypi_0    pypi
charset-normalizer        2.0.12                   pypi_0    pypi
coverage                  6.2                      pypi_0    pypi
curl                      7.61.0               h84994c4_0
cycler                    0.11.0                   pypi_0    pypi
cython                    0.29.30                  pypi_0    pypi
dask                      2021.3.0                 pypi_0    pypi
distlib                   0.3.5                    pypi_0    pypi
et-xmlfile                1.1.0                    pypi_0    pypi
fastparquet               0.4.1                    pypi_0    pypi
filelock                  3.4.1                    pypi_0    pypi
flexbar                   3.5.0                hf53871c_5    bioconda
identify                  2.4.4                    pypi_0    pypi
idna                      3.3                      pypi_0    pypi
importlib-metadata        4.8.3                    pypi_0    pypi
importlib-resources       5.2.3                    pypi_0    pypi
iniconfig                 1.1.1                    pypi_0    pypi
joblib                    0.13.2                   pypi_0    pypi
kiwisolver                1.3.1                    pypi_0    pypi
libcurl                   7.61.0               h1ad7b7a_0
libedit                   3.1.20210910         h7f8727e_0
libffi                    3.2.1             hf484d3e_1007
libgcc                    7.2.0                h69d50b8_2
libgcc-ng                 11.2.0               h1234567_1                                                                                                          [26/1646]
libgomp                   11.2.0               h1234567_1
libssh2                   1.8.0                h9cfc8f7_4
libstdcxx-ng              11.2.0               h1234567_1
libzlib                   1.2.11            h166bdaf_1014    conda-forge
llvmlite                  0.36.0                   pypi_0    pypi
matplotlib                3.3.4                    pypi_0    pypi
matplotlib-venn           0.11.7                   pypi_0    pypi
more-itertools            8.13.0                   pypi_0    pypi
mygene                    3.2.2                    pypi_0    pypi
ncurses                   6.3                  h5eee18b_3
nodeenv                   1.6.0                    pypi_0    pypi
numba                     0.53.1                   pypi_0    pypi
numpy                     1.16.6                   pypi_0    pypi
openpyxl                  3.0.10                   pypi_0    pypi
openssl                   1.0.2u               h7b6447c_0
packaging                 21.3                     pypi_0    pypi
pandas                    0.24.0                   pypi_0    pypi
patsy                     0.5.2                    pypi_0    pypi
pbio                      1.0.0                    pypi_0    pypi
perl                      5.26.2               h14c3975_0
perl-threaded             5.32.1               hdfd78af_1    bioconda
pillow                    8.4.0                    pypi_0    pypi
pip                       21.2.2           py36h06a4308_0
platformdirs              2.4.0                    pypi_0    pypi
pluggy                    1.0.0                    pypi_0    pypi
pre-commit                2.17.0                   pypi_0    pypi
py                        1.11.0                   pypi_0    pypi
pyparsing                 3.0.7                    pypi_0    pypi
pysam                     0.15.2                   pypi_0    pypi
pystan                    2.19.0.0                 pypi_0    pypi
pytest                    7.0.1                    pypi_0    pypi
pytest-cov                3.0.0                    pypi_0    pypi
python                    3.6.6                h6e4f718_2
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2022.1                   pypi_0    pypi
pyyaml                    5.4                      pypi_0    pypi
readline                  7.0                  h7b6447c_5
requests                  2.27.1                   pypi_0    pypi
rpbp                      2.0.0                    pypi_0    pypi
samtools                  1.7                           1    bioconda
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.2.1                    pypi_0    pypi
seaborn                   0.11.2                   pypi_0    pypi
seqan-library             2.4.0                         0    conda-forge
setuptools                59.6.0                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.33.0               h62c20be_0
star                      2.7.10a              h9ee0642_0    bioconda
statsmodels               0.9.0                    pypi_0    pypi
tbb                       2020.3               hfd86e86_0
threadpoolctl             3.1.0                    pypi_0    pypi
thrift                    0.16.0                   pypi_0    pypi
tk                        8.6.11               h1ccaba5_0
toml                      0.10.2                   pypi_0    pypi
tomli                     1.2.3                    pypi_0    pypi
tqdm                      4.64.0                   pypi_0    pypi
typing-extensions         4.1.1                    pypi_0    pypi
urllib3                   1.26.11                  pypi_0    pypi
virtualenv                20.16.2                  pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0
xz                        5.2.5                h7f8727e_1
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h166bdaf_1014    conda-forge
lkeegan commented 1 year ago

I think I broke the conda environment step in the rp-bp GHA by adding missing deps to setup.cfg in pbiotools:

-python_requires = >=3.6,<3.7.0a0
+    pysam
+    pybedtools
+    Bio
+    pyensembl
+    pystan <3
+
+python_requires = >=3.6,<3.10.0a0

Some/most of these are already installed in the conda env, and maybe it can't find a consistent set of versions from pypi.

A short term fix could be to pip install pbiotools with --no-deps

Currently on gha we are mixing conda and pip dependencies in a not ideal way - better would be to have all dependencies from a single source, i.e. conda.

In this branch I install everything from conda (also using latest versions): https://github.com/lkeegan/rp-bp/commit/c50d13bc7c5084e339b3126790d4896f39aeef66

eboileau commented 1 year ago

Yes, I just noticed python version differences pbiotools vs. rpbp.

Indeed, I agree mixing conda and pip is not ideal... conda install should ideally rely on conda packages, and eventually the standard pip install should rely on pip... but for some packages I don't know if we'll manage to have exactly matching versions in conda vs. pypi (i.e. potentially different install depending on whether it is done via conda or pip...). Let me know your thoughts...

As for my conda, I think I will try to conda update -n base -c defaults conda, and hope it works... so I have the latest version to test locally.

eboileau commented 1 year ago

Ok, the installation works with #118, and we're back to tests failing as before on file comparison (warning is gone, though).

I will try once more to reproduce this locally, and circumscribe the problem. conda update is running (I'm off tomorrow).

lkeegan commented 1 year ago

Your non-conda install of rp-bp also passes the tests, right?

If so, I'm wondering if your conda install of rp-bp is using some system dependency instead of the conda one (e.g. you didn't have pytest-depends in your conda environment so I guess it was picking up a system installed version of it - maybe something similar is happening for another rp-bp dependency?)

It would possibly explain how your conda & non-conda installs are consistent, while my conda install is consistent with the GHA ones but they differ from yours.

eboileau commented 1 year ago

As I'm running the install/tests on our cluster, there is in fact little system-wide packages...

It's difficult to see the actual content of the files on GHA, but it really seems that Rp-Bp cannot reproduce the reference data, so some changes must have occurred, if not in the code, in some dependency that would affect the calculations and/or writing of the files, so I need to reproduce this locally to investigate. I'll first re-install everything freshly and test again, and will also try on my laptop.

lkeegan commented 1 year ago

That makes sense - if helpful I can run the failing tests locally and send you the generated output files?

I also just noticed that even when installing into a fresh conda environment it looks like previsouly pickled pystan models are re-used instead of being re-compiled, as they are cached somewhere outside of conda:

  WARNING  : A model already exists at: /home/liam/.local/share/rpbp/rpbp_models/nonperiodic/no-periodicity.pkl. Skipping.

Maybe this is also the case on your cluster and your pystan models were previously compiled with a different compiler / pystan version?

eboileau commented 1 year ago

With your recent PR #121, output is easier (and faster) to see on GHA! I will merge next.

Yes, the models are not recompiled by default, unless we force it, but this option has not been thoroughly tested. pystan version has not changed yet so far, but you're right, it might have to do with the C compiler version...? I could try to regenerate my models.

lkeegan commented 1 year ago

So copy&pasting the 100 displayed values from the GHA failing test output & comparing to my local test output they agree exactly:

c-elegans-rep-1-unique.metagene-periodicity-bayes-factors.csv.gz

image

eboileau commented 1 year ago

Indeed, this points to the direction of the Stan models, and this would make sense since I generated the reference data with these, i.e. results are deterministic up to the parameters and models. So as I initially thought, we could not expect results (part 2) to match exactly in a general situation, and I was fooled by the fact that I did reproduce them exactly (overlooking the model compilation)! But as I said, when I'm back next week, I'll first try to reproduce the error locally (re-install rpbp and recompile models), to be sure, and then adjust accordingly. Thanks for your help.

lkeegan commented 1 year ago

Sounds good!

eboileau commented 1 year ago

After multiple failed attempts, I remembered that we have one test node with Debian 11 on the cluster, so I installed the conda environment there, recompiled the models, and ran the example. I could finally reproduce exactly the GHA results (and yours @lkeegan ), which I was not able to reproduce on Debian 10, so it seems likely that the kernel/architecture and/or C compiler version affected the model output.

For selected outputs, in particular low confidence profiles e.g. no or little p-sites (profile_sum or profile_peak), the mean and variance can differ drastically, so it does not make sense to compare these even within some tolerance. However, we could possibly select the periodic lengths and offsets, and compare values.

As for metagene-profiles and orf-profiles, these should be directly comparable.

As for the ORF predictions, model outputs (mean, var) differ, but also the actual set of ORFs. The reason is that differences in Bayes factor mean and variance (get_bf_filter) can lead to slightly different sets of ORFs, which are then potentially affected by get_longest_features_by_end. In general, discrepancies in model outputs increase as confidence decreases, with large differences for variance.

One possible solution would be to completely ignore the model outputs, and check consistency of the results only on the intersection, and report (without raising errors) the ORFs that are found in one or the other, but not both results ( i.e. reference .vs current ), for completeness.

Another solution is to actually compare only the means with some reasonable tolerance e.g.

reference = predicted_orfs_reference[[c for c in predicted_orfs_reference.columns if 'var' not in c]]
current = predicted_orfs_current[[c for c in predicted_orfs_current.columns if 'var' not in c]]
pd.testing.assert_frame_equal(reference, current, check_exact=False, rtol=.5)

In all cases, discrepancies in some model outputs is worse than I initially thought, and we haven't even tested on macOS, etc. But we need to move on, so I'll work out something.

eboileau commented 1 year ago

For the record, I'm musing the conda install only with

git clone https://github.com/dieterich-lab/rp-bp.git
cd rp-bp
git checkout dev-ssciwr
conda env create --prefix debian11-conda -f environment.yml
conda activate debian11-conda
pip install dask sklearn appdirs tqdm mygene openpyxl fastparquet more_itertools matplotlib matplotlib_venn seaborn
pip install git+https://github.com/dieterich-lab/pbiotools.git@dev-ssciwr --no-deps --verbose
pip install . --verbose