Closed eboileau closed 1 year ago
next steps:
Thanks, I just merged https://github.com/dieterich-lab/pbiotools/pull/17. I will now merge to master and push. I know this is not consistent with versioning, but I think this can just be "merged" into v3, without a new release.
@lkeegan what about https://github.com/dieterich-lab/pbiotools/pull/13 I can just change the versions before pushing to master, but I don't want things to break... or we can just ignore it for now?
you can ignore it or merge it (probably easiest to first push your changes) - this PR isn't important and won't break anything, it's just updating the pre-commit version hooks
No, actually, this doesn't make sense to not release a new version, as the PyPI release needs to change accordingly. If we remove Python 3.6, in particular, then I guess we have to go for v4.
Ok, pbiotools is now to v4 release.
I will try to first sort the docs, in particular #146 , then add a pypi CI to rp-bp (copied from pbiotools), and update the pbiotools v4 requirement, etc.
@lkeegan , for the PyPI CI (pypi.yml), could you please set-up a template? I thought I'd use the pbiotools one, but in fact we would need to run the conda install, and not the pip install, or I am wrong?
@eboileau you should be able to use the pbiotools one - this is for PyPI so pip install with pypi dependencies is what we want - this is completely independent of conda/bioconda (although I guess you'll need to also install star etc in the CI if you want to run the tests)
Would that work to create a conda env to install bowtie2, fastqc, flexbar, samtools, star
, then pip install .[test] --verbose
, then run tests, etc. ? Then, the Stan models should compile when running the tests.
I would suggest to keep it simple, either
python -m build
and uploads the wheel to pypiIn either case, python -m build
creates the wheel in an isolated environment so they should both generate the same wheel.
And I think skipping the tests would be ok since this is duplicating tests done in the other CI job
Yes, I also think skipping the tests is fine, so this would look like:
name: PyPI
on:
push:
tags:
- "*.*.*"
jobs:
pypi:
name: Upload to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: pip install --upgrade build
- run: python -m build
- uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.pypi_token }}
yes, looks good to me
Thanks. I will then add this, update reqs to pbiotools>=4.0.0
, etc. merge with master, and make a new release.
You said you already have a bioconda rp-bp recipe ready, right?
Sounds good! I have a wip bioconda recipe here that I'll update once there is a new rp-bp github release: https://github.com/lkeegan/bioconda-recipes/commit/1f19d50f04d28b35203ff2298886526e2901b897
Hi @lkeegan New release is there!
@eboileau great, I've updated the recipe & made a PR
If you download & extract the LinuxArtifacts file here you should be able to both install the conda package & try using the docker image:
https://github.com/bioconda/bioconda-recipes/pull/39210#issuecomment-1422208495
Let me know if they work for you and I'll ask for the PR to be merged
Great, I'll test the package and the container usage, and let you know as soon as possible.
fyi after following the docker instructions there I could then run the docker container with:
docker run -it quay.io/biocontainers/rp-bp:3.0.0--py310h9f5acd7_0 /bin/bash
Both for conda and docker running compile-rpbp-models
returned straight away for me, so looks like the pre-compiled models are being shipped, but I didn't try actually running anything
Thanks. I will try running the pipeline. I also want to try passing the config and writing results to disk, e.g. with --volume
, and test the apps (not sure here whether we need something special?). I also want to test Singularity.
I have not been able to test the full pipeline. There are two immediate problems:
run-all-rpbp-instances
fails at the first sampling iteration INFO rpbp.orf_profile_construction.estimate_metagene_profile_bayes_factors 2023-02-08 11:44:42,328 : Estimating Bayes factors for lengths: 16,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
^M 0%| | 0/19 [00:00<?, ?it/s]^M 42%|~V~H~V~H~V~H~V~H~V~O | 8/19 [00:03<00:05, 2.19it/s]^M100%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H| 19/199
[00:03<00:00, 5.12it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
r = call_item()
File "/usr/local/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
return self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 620, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/joblib/parallel.py", line 288, in __call__
return [func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/joblib/parallel.py", line 288, in <listcomp>
return [func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 148, in estimate_profile_bayes_factors
(bft_periodic, bft_nonperiodic) = estimate_marginal_likelihoods(
File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 64, in estimate_marginal_likelihoods
bft_periodic = [
File "/usr/local/lib/python3.10/site-packages/rpbp/orf_profile_construction/estimate_metagene_profile_bayes_factors.py", line 65, in <listcomp>
pm.sample(
File "/usr/local/lib/python3.10/site-packages/cmdstanpy/model.py", line 1201, in sample
raise RuntimeError(msg)
RuntimeError: Error during sampling:
Command and output files:
RunSet: chains=2, chain_ids=[1, 2], num_processes=2
cmd (chain 1):
['/usr/local/lib/python3.10/site-packages/rpbp/models/periodic/start-high-low-low', 'id=1', 'random', 'seed=8675309', 'data', 'file=/tmp/tmpfqz442ci/oiorukwp.json', 'output', 'file=/tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_1.csv', 'method=sample', 'num_samples=100', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1']
retcodes=[127, 127]
per-chain output files (showing chain 1 only):
csv_file:
/tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_1.csv
console_msgs (if any):
/tmp/tmpfqz442ci/start-high-low-lowyr_l4ofx/start-high-low-low-20230208114446_0-stdout.txt
Consider re-running with show_console=True if the above output is unclear!
"""
...
This is due to
/usr/local/lib/python3.10/site-packages/rpbp/models/periodic/start-high-low-low: error while loading shared libraries: libtbb.so.12: cannot open shared object file: No such file or directory
This is happening with Docker, Singularity, and also with the conda install.
--bind
(Singularity) or --volume
(Docker). This is not a critical issue, but depending on 1 above, we might have to do something anyway...
With singularity shell
, $HOME
is mounted by default. If I run the pipeline under $HOME
, this is fine, e.g.
$ ls -la ~/data/c-elegans-chrI-example/without-rrna-mapping/
lrwxrwxrwx 1 eboileau dieterich 108 Feb 8 12:09 c-elegans-rep-1.bam -> /home/eboileau/data/c-elegans-chrI-example/without-rrna-mapping/c-elegans-rep-1Aligned.sortedByCoord.out.bam
but using --bind
(or --volume
with Docker) results in broken symlinks. In theory, this should be fine even to run the pipeline in multiple times, e.g. prepare-rpbp-genome
, then run-all-rpbp-instances
, etc., as long as the argument to --bind
remains the same (the symlinks become "live" again), but this is a bit of a hack, and does not allow to actually use intermediate files outside the container if needed (although we generally only care about the final output, which is not symlinked).
Do you know of any workaround in Singularity/Docker? Otherwise, we need to modify the code (symlinks should in fact just be relative).
This seems related. If this is the case, pre-compiling wouldn't work. I will test deleting the exec files, and re-running the pipeline.
The pre-compiled models link to libtbb.so.12
root@b1cbe6861ae8:/# ldd /usr/local/lib/python3.10/site-packages/rpbp/models/translated/periodic-gaussian-mixture
linux-vdso.so.1 (0x00007ffd441f4000)
libtbb.so.12 => not found
libstdc++.so.6 => /usr/local/lib/python3.10/site-packages/rpbp/models/translated/../../../../../libstdc++.so.6 (0x00007f66c83cf000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f66c824c000)
libgcc_s.so.1 => /usr/local/lib/python3.10/site-packages/rpbp/models/translated/../../../../../libgcc_s.so.1 (0x00007f66c8233000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f66c8072000)
/lib64/ld-linux-x86-64.so.2 (0x00007f66c869f000)
But the conda environment instead provides libtbb.so.2
:
root@b1cbe6861ae8:/# ls /usr/local/lib/libtbb.so*
/usr/local/lib/libtbb.so /usr/local/lib/libtbb.so.2
I'll look into this, we need to modify the recipe somehow to ensure the same tbb library version is being used for compiling and for the user installation.
I don't know of a good workaround for the symlinks with docker issue - but if you could modify the code to use relative paths to some base path instead of symlinks that would (imo) be a big improvement in usability/portability (not just for Docker)
It seems the libtbb.so culprit is flexbar
- @eboileau before I dig any deeper just confirming this is a required dependency of rp-bp?
Not sure what the best solution is in the meantime, but maybe a pragmatic option would be to skip the pre-compiled models in the first bioconda release and add this in a later bioconda release?
Hi @lkeegan yes, Flexbar is definitely required. If skipping compilation, they should be compiled by default on the first run. I think we can modify the recipe as you suggest.
Re 2 above, I don't want to re-write the complete I/O. The easiest solution for now is to remove symlinks by actually renaming files if needed. I'm working on this.
This case was not set-up in the regression tests as we don't have data for it... this is unfortunate...
In the rpbp recipe as well as in toml and environment, we specify pbiotools >=4.0.0
. I'll try to address this tomorrow and release 4.0.1. I assume rpbp would anyway use the latest release available, but it would be better to actually use pbiotools >=4.0.1
. I will also make a patch release of rpbp to address this and above.
@lkeegan can you please bear with me before updating the recipe? Thanks.
@eboileau yes no problem, I'll try to make some progress on the flexbar issue in the meantime
I'll be ready to push and make a patch release of Rp-Bp, as soon as pbiotools 4.0.1 is available on bioconda.
P.S. I'm off for 2 weeks from 14 Feb.
@eboileau pbiotools 4.0.1 is now on bioconda (and I didn't do anything, the bot made a PR from your github release & a maintainer merged it)
Thanks, indeed saw it yesterday. I'll try to sort #150, then release.
Oh no... For macOS, this doesn't work anymore...
Command ['make', 'STAN_THREADS=TRUE', '/usr/local/miniconda/envs/rpbp/lib/python3.7/site-packages/rpbp/models/nonperiodic/no-periodicity']
error during processing No such file or directory
and a lot of warnings... see https://github.com/dieterich-lab/rp-bp/actions/runs/4163364277/jobs/7203670504 I don't think we did anything that should have affected the model compilation since https://github.com/dieterich-lab/rp-bp/actions/runs/4105315063
This looks like a mismatch between compiler & linker. Looking at the conda environment there is a mix of defaults & conda-forge packages which is likely the cause of this - certain packages (like the linker) are not compatible between defaults and conda-forge.
The good news is I think this is just an issue with the conda environment on the CI, the first thing I would try is modifying this line: https://github.com/dieterich-lab/rp-bp/blob/9f6ec64c9171227c16e294b996d12dbce96ced1a/.github/workflows/ci.yml#L31
channel-priority: strict
Then hopefully the conda-forge packages will take precedence over the defaults.
The bad news here is that looking through the logs I noticed that stan is enabling a bunch of cpu-specific optimizations in the c++ compile step, which means the pre-compiled models may crash with "illegal instruction" errors if the cpu used when building the bioconda recipe supports some instruction that the users cpu doesn't. Two options here:
cpp_options
but not clear if you can disable these optimizations using this or if you need to modify the makefile). Also the resulting binary may be slower than if it was compiled with optimizations enabled (but I'd be surprised if there was a significant difference)Maybe option 1 is the safer/simpler way to go, especially as compiling the models doesn't take very long?
@lkeegan Thanks for your quick feedback.
Yes, I was looking into this in more details, I also think setting the channel priority might work.
As for the cpp_options
, actually I don't think we need it, since I understand it is mostly used for high-level parallelization via multi-threading, which we did not use in the end. The problem may remain, unless all the cpu-optimization compile steps are only due to this flag, as you point out.
I think option 1 is safer, and also circumvent the above Flexbar problem. I find it annoying not to be able to ship pre-compiled models, but in the end compilation would really just happen on the first run.
I will update the CI config, and remove the Stan cpp option from defaults, and check if tests are passing on GA.
Damn it! Looks like it's not going to be solved before I leave for vacation...
Now it's not resolving Dash + werkzeug versions correctly... In fact, even for Ubuntu there are conflicts when setting up the environment... and eventually samtools seem to be failing...
As this looks like only a problem with the CI setup, maybe it would make sense for you to make a release ignoring the broken CI?
I can then do the bioconda packaging & fix the CI, and if there are any issues with the actual release they can be fixed in a patch release when you're back from vacation?
Re. shipping pre-compiled models on bioconda after some more research I actually think this should be ok:
Wow!
But shall I leave channel-priority: strict
? i.e make a release as is?
I think it's ok - this only affects the conda environment in the CI jobs, not what you release
v3.0.1 is there now.
This should fix the conda issues in the CI: https://github.com/dieterich-lab/rp-bp/pull/151
Test suite now runs successfully for all linux conda packages built here as well as one of the docker containers (all with pre-compiled models): https://github.com/bioconda/bioconda-recipes/pull/39210
Thank you for this @lkeegan.
I just merged #151, but I'm not sure if I need to make a new release now i.e. if the bioconda package needs to be updated or if all is done? Before installing, I'm trying to use the docker containers and see the rpbp/tags, but I get Your user session has expired. Please sign in to continue.
, which is strange... (looking here https://github.com/bioconda/bioconda-recipes/pull/39210, I'm not sure in what order tags are given, i.e. which one to use).
Thanks for a short update.
@eboileau
docker pull quay.io/biocontainers/rpbp:3.0.1--py37h96cfd12_0
@lkeegan
Thanks for the update.
Re rpbp docker, that's right, I can access the containers now, and all seem to work fine, and also singularity, except one little thing, see #150. I'm trying to see what is the problem.
I would mostly be ready for the next release, but before
[x] I'd like this to be sorted: the pbiotools CI (remove Python3.6), and update the bioconda recipe (add missing joblib, numpy, pandas, scipy, remove pybedtools and appdirs). I think we can also remove "parse_meme_names.py" and the pyyaml dependency altogether.
[x] what do we do with https://github.com/dieterich-lab/rp-bp/security/dependabot and https://github.com/dieterich-lab/pbiotools/pull/13 ?
[x] Rp-Bp docs, see #146
After that, what remains to be done? i.e. what are the next steps now?
I think we also need to change
requires-python = ">=3.6,<3.11"
torequires-python = ">=3.7,<3.11"
. Do we need to update environment.yml and dependencies in pyproject.toml, etc.I'd also like to get/update some badges, at least for
version
orrelease
,python
,docs
,build
(or CI?),codecov
, andpre-commit
.