dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

Next release #147

Closed eboileau closed 1 year ago

eboileau commented 1 year ago

I would mostly be ready for the next release, but before

After that, what remains to be done? i.e. what are the next steps now?

I think we also need to change requires-python = ">=3.6,<3.11" to requires-python = ">=3.7,<3.11". Do we need to update environment.yml and dependencies in pyproject.toml, etc.

I'd also like to get/update some badges, at least for version or release, python, docs, build (or CI?), codecov, and pre-commit.

lkeegan commented 1 year ago

next steps:

eboileau commented 1 year ago

Thanks, I just merged https://github.com/dieterich-lab/pbiotools/pull/17. I will now merge to master and push. I know this is not consistent with versioning, but I think this can just be "merged" into v3, without a new release.

eboileau commented 1 year ago

@lkeegan what about https://github.com/dieterich-lab/pbiotools/pull/13 I can just change the versions before pushing to master, but I don't want things to break... or we can just ignore it for now?

lkeegan commented 1 year ago

you can ignore it or merge it (probably easiest to first push your changes) - this PR isn't important and won't break anything, it's just updating the pre-commit version hooks

eboileau commented 1 year ago

No, actually, this doesn't make sense to not release a new version, as the PyPI release needs to change accordingly. If we remove Python 3.6, in particular, then I guess we have to go for v4.

eboileau commented 1 year ago

Ok, pbiotools is now to v4 release.

I will try to first sort the docs, in particular #146 , then add a pypi CI to rp-bp (copied from pbiotools), and update the pbiotools v4 requirement, etc.

eboileau commented 1 year ago

@lkeegan , for the PyPI CI (pypi.yml), could you please set-up a template? I thought I'd use the pbiotools one, but in fact we would need to run the conda install, and not the pip install, or I am wrong?

lkeegan commented 1 year ago

@eboileau you should be able to use the pbiotools one - this is for PyPI so pip install with pypi dependencies is what we want - this is completely independent of conda/bioconda (although I guess you'll need to also install star etc in the CI if you want to run the tests)

eboileau commented 1 year ago

Would that work to create a conda env to install bowtie2, fastqc, flexbar, samtools, star, then pip install .[test] --verbose , then run tests, etc. ? Then, the Stan models should compile when running the tests.

lkeegan commented 1 year ago

I would suggest to keep it simple, either

In either case, python -m build creates the wheel in an isolated environment so they should both generate the same wheel. And I think skipping the tests would be ok since this is duplicating tests done in the other CI job

eboileau commented 1 year ago

Yes, I also think skipping the tests is fine, so this would look like:

name: PyPI

on:
  push:
    tags:
      - "*.*.*"
jobs:
  pypi:
    name: Upload to PyPI
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - run: pip install --upgrade build
      - run: python -m build
      - uses: pypa/gh-action-pypi-publish@release/v1
        with:
          user: __token__
          password: ${{ secrets.pypi_token }}
lkeegan commented 1 year ago

yes, looks good to me

eboileau commented 1 year ago

Thanks. I will then add this, update reqs to pbiotools>=4.0.0, etc. merge with master, and make a new release. You said you already have a bioconda rp-bp recipe ready, right?

lkeegan commented 1 year ago

Sounds good! I have a wip bioconda recipe here that I'll update once there is a new rp-bp github release: https://github.com/lkeegan/bioconda-recipes/commit/1f19d50f04d28b35203ff2298886526e2901b897

eboileau commented 1 year ago

Hi @lkeegan New release is there!

lkeegan commented 1 year ago

@eboileau great, I've updated the recipe & made a PR

If you download & extract the LinuxArtifacts file here you should be able to both install the conda package & try using the docker image:

https://github.com/bioconda/bioconda-recipes/pull/39210#issuecomment-1422208495

Let me know if they work for you and I'll ask for the PR to be merged

eboileau commented 1 year ago

Great, I'll test the package and the container usage, and let you know as soon as possible.

lkeegan commented 1 year ago

fyi after following the docker instructions there I could then run the docker container with:

docker run -it quay.io/biocontainers/rp-bp:3.0.0--py310h9f5acd7_0 /bin/bash

Both for conda and docker running compile-rpbp-models returned straight away for me, so looks like the pre-compiled models are being shipped, but I didn't try actually running anything

eboileau commented 1 year ago

Thanks. I will try running the pipeline. I also want to try passing the config and writing results to disk, e.g. with --volume, and test the apps (not sure here whether we need something special?). I also want to test Singularity.

eboileau commented 1 year ago

I have not been able to test the full pipeline. There are two immediate problems:


  1. run-all-rpbp-instances fails at the first sampling iteration
INFO     rpbp.orf_profile_construction.estimate_metagene_profile_bayes_factors 2023-02-08 11:44:42,328 : Estimating Bayes factors for lengths: 16,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
^M  0%|          | 0/19 [00:00<?, ?it/s]^M 42%|~V~H~V~H~V~H~V~H~V~O     | 8/19 [00:03<00:05,  2.19it/s]^M100%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H| 19/199
 [00:03<00:00,  5.12it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 148, in estimate_profile_bayes_factors
    (bft_periodic, bft_nonperiodic) = estimate_marginal_likelihoods(
  File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 64, in estimate_marginal_likelihoods
    bft_periodic = [
  File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 65, in <listcomp>
    pm.sample(
  File "/usr/local/lib/python3.10/site-packages/cmdstanpy/model.py", line 1201, in sample
    raise RuntimeError(msg)
RuntimeError: Error during sampling:

Command and output files:
RunSet: chains=2, chain_ids=[1, 2], num_processes=2
 cmd (chain 1):
        ['/usr/local/lib/python3.10/site-packages/rpbp/models/periodic/start-high-low-low', 'id=1', 'random', 'seed=8675309', 'data', 'file=/tmp/tmpfqz442ci/oiorukwp.json', 'output', 'file=/tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_1.csv', 'method=sample', 'num_samples=100', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1'] 
 retcodes=[127, 127]
 per-chain output files (showing chain 1 only):
 csv_file:
        /tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_1.csv
 console_msgs (if any):
        /tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_0-stdout.txt
Consider re-running with show_console=True if the above output is unclear!
"""

...

This is due to

/usr/local/lib/python3.10/site-packages/rpbp/models/periodic/start-high-low-low: error while loading shared libraries: libtbb.so.12: cannot open shared object file: No such file or directory

This is happening with Docker, Singularity, and also with the conda install.


  1. Rp-Bp uses symlinks, but these are broken with --bind (Singularity) or --volume (Docker).

This is not a critical issue, but depending on 1 above, we might have to do something anyway...

With singularity shell, $HOME is mounted by default. If I run the pipeline under $HOME, this is fine, e.g.

$ ls -la ~/data/c-elegans-chrI-example/without-rrna-mapping/
lrwxrwxrwx  1 eboileau dieterich    108 Feb  8 12:09 c-elegans-rep-1.bam -> /home/eboileau/data/c-elegans-chrI-example/without-rrna-mapping/c-elegans-rep-1Aligned.sortedByCoord.out.bam

but using --bind (or --volume with Docker) results in broken symlinks. In theory, this should be fine even to run the pipeline in multiple times, e.g. prepare-rpbp-genome, then run-all-rpbp-instances, etc., as long as the argument to --bind remains the same (the symlinks become "live" again), but this is a bit of a hack, and does not allow to actually use intermediate files outside the container if needed (although we generally only care about the final output, which is not symlinked).

Do you know of any workaround in Singularity/Docker? Otherwise, we need to modify the code (symlinks should in fact just be relative).

eboileau commented 1 year ago

This seems related. If this is the case, pre-compiling wouldn't work. I will test deleting the exec files, and re-running the pipeline.

lkeegan commented 1 year ago

1

The pre-compiled models link to libtbb.so.12

root@b1cbe6861ae8:/# ldd /usr/local/lib/python3.10/site-packages/rpbp/models/translated/periodic-gaussian-mixture
        linux-vdso.so.1 (0x00007ffd441f4000)
        libtbb.so.12 => not found
        libstdc++.so.6 => /usr/local/lib/python3.10/site-packages/rpbp/models/translated/../../../../../libstdc++.so.6 (0x00007f66c83cf000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f66c824c000)
        libgcc_s.so.1 => /usr/local/lib/python3.10/site-packages/rpbp/models/translated/../../../../../libgcc_s.so.1 (0x00007f66c8233000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f66c8072000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f66c869f000)

But the conda environment instead provides libtbb.so.2:

root@b1cbe6861ae8:/# ls /usr/local/lib/libtbb.so*
/usr/local/lib/libtbb.so    /usr/local/lib/libtbb.so.2

I'll look into this, we need to modify the recipe somehow to ensure the same tbb library version is being used for compiling and for the user installation.

2

I don't know of a good workaround for the symlinks with docker issue - but if you could modify the code to use relative paths to some base path instead of symlinks that would (imo) be a big improvement in usability/portability (not just for Docker)

lkeegan commented 1 year ago

It seems the libtbb.so culprit is flexbar - @eboileau before I dig any deeper just confirming this is a required dependency of rp-bp?

Not sure what the best solution is in the meantime, but maybe a pragmatic option would be to skip the pre-compiled models in the first bioconda release and add this in a later bioconda release?

eboileau commented 1 year ago

Hi @lkeegan yes, Flexbar is definitely required. If skipping compilation, they should be compiled by default on the first run. I think we can modify the recipe as you suggest.


Re 2 above, I don't want to re-write the complete I/O. The easiest solution for now is to remove symlinks by actually renaming files if needed. I'm working on this.

eboileau commented 1 year ago

This case was not set-up in the regression tests as we don't have data for it... this is unfortunate...

In the rpbp recipe as well as in toml and environment, we specify pbiotools >=4.0.0. I'll try to address this tomorrow and release 4.0.1. I assume rpbp would anyway use the latest release available, but it would be better to actually use pbiotools >=4.0.1. I will also make a patch release of rpbp to address this and above.

@lkeegan can you please bear with me before updating the recipe? Thanks.

lkeegan commented 1 year ago

@eboileau yes no problem, I'll try to make some progress on the flexbar issue in the meantime

eboileau commented 1 year ago

I'll be ready to push and make a patch release of Rp-Bp, as soon as pbiotools 4.0.1 is available on bioconda.

P.S. I'm off for 2 weeks from 14 Feb.

lkeegan commented 1 year ago

@eboileau pbiotools 4.0.1 is now on bioconda (and I didn't do anything, the bot made a PR from your github release & a maintainer merged it)

eboileau commented 1 year ago

Thanks, indeed saw it yesterday. I'll try to sort #150, then release.

eboileau commented 1 year ago

Oh no... For macOS, this doesn't work anymore...

Command ['make', 'STAN_THREADS=TRUE', '/usr/local/miniconda/envs/rpbp/lib/python3.7/site-packages/rpbp/models/nonperiodic/no-periodicity']
    error during processing No such file or directory

and a lot of warnings... see https://github.com/dieterich-lab/rp-bp/actions/runs/4163364277/jobs/7203670504 I don't think we did anything that should have affected the model compilation since https://github.com/dieterich-lab/rp-bp/actions/runs/4105315063

lkeegan commented 1 year ago

This looks like a mismatch between compiler & linker. Looking at the conda environment there is a mix of defaults & conda-forge packages which is likely the cause of this - certain packages (like the linker) are not compatible between defaults and conda-forge.

The good news is I think this is just an issue with the conda environment on the CI, the first thing I would try is modifying this line: https://github.com/dieterich-lab/rp-bp/blob/9f6ec64c9171227c16e294b996d12dbce96ced1a/.github/workflows/ci.yml#L31

          channel-priority: strict

Then hopefully the conda-forge packages will take precedence over the defaults.

The bad news here is that looking through the logs I noticed that stan is enabling a bunch of cpu-specific optimizations in the c++ compile step, which means the pre-compiled models may crash with "illegal instruction" errors if the cpu used when building the bioconda recipe supports some instruction that the users cpu doesn't. Two options here:

  1. Give up on shipping pre-compiled models, then it is fine that stan enables these optimizations since the compiling happens on the same cpu that will be used to run
  2. Disable these optimizations in the pre-compiled models (I found cpp_options but not clear if you can disable these optimizations using this or if you need to modify the makefile). Also the resulting binary may be slower than if it was compiled with optimizations enabled (but I'd be surprised if there was a significant difference)

Maybe option 1 is the safer/simpler way to go, especially as compiling the models doesn't take very long?

eboileau commented 1 year ago

@lkeegan Thanks for your quick feedback.

Yes, I was looking into this in more details, I also think setting the channel priority might work.


As for the cpp_options , actually I don't think we need it, since I understand it is mostly used for high-level parallelization via multi-threading, which we did not use in the end. The problem may remain, unless all the cpu-optimization compile steps are only due to this flag, as you point out. I think option 1 is safer, and also circumvent the above Flexbar problem. I find it annoying not to be able to ship pre-compiled models, but in the end compilation would really just happen on the first run.

eboileau commented 1 year ago

I will update the CI config, and remove the Stan cpp option from defaults, and check if tests are passing on GA.

eboileau commented 1 year ago

Damn it! Looks like it's not going to be solved before I leave for vacation...

Now it's not resolving Dash + werkzeug versions correctly... In fact, even for Ubuntu there are conflicts when setting up the environment... and eventually samtools seem to be failing...

lkeegan commented 1 year ago

As this looks like only a problem with the CI setup, maybe it would make sense for you to make a release ignoring the broken CI?

I can then do the bioconda packaging & fix the CI, and if there are any issues with the actual release they can be fixed in a patch release when you're back from vacation?

Re. shipping pre-compiled models on bioconda after some more research I actually think this should be ok:

eboileau commented 1 year ago

Wow!

But shall I leave channel-priority: strict? i.e make a release as is?

lkeegan commented 1 year ago

I think it's ok - this only affects the conda environment in the CI jobs, not what you release

eboileau commented 1 year ago

v3.0.1 is there now.

lkeegan commented 1 year ago

This should fix the conda issues in the CI: https://github.com/dieterich-lab/rp-bp/pull/151

lkeegan commented 1 year ago

Test suite now runs successfully for all linux conda packages built here as well as one of the docker containers (all with pre-compiled models): https://github.com/bioconda/bioconda-recipes/pull/39210

eboileau commented 1 year ago

Thank you for this @lkeegan. I just merged #151, but I'm not sure if I need to make a new release now i.e. if the bioconda package needs to be updated or if all is done? Before installing, I'm trying to use the docker containers and see the rpbp/tags, but I get Your user session has expired. Please sign in to continue., which is strange... (looking here https://github.com/bioconda/bioconda-recipes/pull/39210, I'm not sure in what order tags are given, i.e. which one to use). Thanks for a short update.

lkeegan commented 1 year ago

@eboileau

eboileau commented 1 year ago

@lkeegan

Thanks for the update.

Re rpbp docker, that's right, I can access the containers now, and all seem to work fine, and also singularity, except one little thing, see #150. I'm trying to see what is the problem.