Closed lkeegan closed 1 year ago
It is using STAR version: 2.7.10a, right? I'll check that.
yes:
star 2.7.10a h527b516_0 bioconda
There has been quite a few STAR versions in between, so indeed we might have issues with some parameters, etc.
The simplest thing to do is to follow the recommendation, and use --genomeSAindexNbases 10
at the index creation step. This seems related to the small genome size.
The problem, however, is that rpbp currently only allows to pass options to STAR in creating the profiles, but not when preparing the genome. I'll have to make changes in the code, and then pass the option in the test.
I added STAR options (genome) to prepare_rpbp_genome.py, and --genomeSAindexNbases 10
to confest.py (commit https://github.com/dieterich-lab/rp-bp/commit/50f8ece98abc255c5f64a8433048261d17541fcc), to accommodate for the small genome size of the example. This should now work https://github.com/dieterich-lab/rp-bp/actions/runs/2903325401
Note: External programs (STAR, Flexbar) option handling needs to be improved though... this is brittle now, e.g. if using --use-slurm
, we need to escape them when passing the arguments --star-options \"--genomeSAindexNbases 10\"
, but if we do without --use-slurm
, then this does not work. It largely depends on the shell, as it may strips the quotes, and the behaviour is expected to be different when running run_rpbp_pipeline.py, as it also pre-processes the option string... (see pbiotools pgrm_utils.py).
Looks like it may be an issue with STAR, there are various segfault fixes to STAR in the github repo after the 2.7.10 release which are not on bioconda, and downgrading to the previous 2.7.9 version seems to resolve this issue, although the macos tests still fail but at a later point: https://github.com/lkeegan/rp-bp/runs/7952752406?check_suite_focus=true
I have 2.7.10a installed, and I don't experience any issues, but I'm fine with fixing the version to 2.7.9a for now.
The error is different
estimate-metagene-profile-bayes-factors: error: argument --periodic-models: expected at least one argument
and is due to failed model compilation at step install rp-bp with --no-deps option
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_77987227f3abe322d843fe958b86c59f NOW.
/usr/local/miniconda/envs/rpbp-dev/lib/python3.7/site-packages/setuptools/dist.py:760: UserWarning: Usage of dash-separated 'long-description' will not be supported in future versions. Please use the underscore name 'long_description' instead
% (opt, underscore_opt)
/usr/local/miniconda/envs/rpbp-dev/lib/python3.7/site-packages/setuptools/dist.py:760: UserWarning: Usage of dash-separated 'maintainer-email' will not be supported in future versions. Please use the underscore name 'maintainer_email' instead
% (opt, underscore_opt)
Traceback (most recent call last):
File "/usr/local/miniconda/envs/rpbp-dev/lib/python3.7/site-packages/setuptools/_distutils/unixccompiler.py", line 174, in _compile
extra_postargs)
File "/usr/local/miniconda/envs/rpbp-dev/lib/python3.7/site-packages/setuptools/_distutils/ccompiler.py", line 917, in spawn
spawn(cmd, dry_run=self.dry_run, **kwargs)
File "/usr/local/miniconda/envs/rpbp-dev/lib/python3.7/site-packages/setuptools/_distutils/spawn.py", line 69, in spawn
"command %r failed with exit code %s" % (cmd, exitcode))
distutils.errors.DistutilsExecError: command '/usr/local/miniconda/envs/rpbp-dev/bin/x86_64-apple-darwin13.4.0-clang' failed with exit code 254
This is a macOS (version?) specific issue, apparently related to xcode commandline tools/clang compiler. There are various hints, but this one seems related to Stan.
I see - maybe it would make sense to first update rp-bp to use pystan3 and see if this issue is still there before trying to fix this?
Indeed, that's what I also think.
But I would first need to check https://pystan.readthedocs.io/en/latest/upgrading.html#upgrading
There shouldn't be major issues, at least syntactically, but clearly I would need to test it. I will try to address that as soon as possible (I have some other deadlines coming...)
Sure, no urgency from my side! A related improvement (as long as you don't need to compile new stan models after installation) could be to do the pystan compile step as part of the bioconda package build process instead of as part of the user install, which would:
The main obstacle to distributing a compiled Stan model is that it needs to be compiled for different platforms separately. So typically, the simplest way around this problem involves compiling (and saving) the models during installation, as we currently do.
Sure, but that is essentially what conda is: a bunch of software pre-compiled for different plaforms - so we could pre-compile the stan models for each platform on bioconda in the same way that e.g. flexbar is compiled for each platform. (It would get more involved if we need to also do this for each python version, that depends I guess on exactly how pystan interacts with stan). I agree what you currently do is also fine!
@lkeegan please let me know what is missing to resolve this issue? The CmdStanPy interface is now implemented, and the models are installed/compiled in a default location under the conda environment, which is, I think, the best option, since it isolates them from any other installation/environment.
@eboileau sounds good - adding CI for macos and testing different python versions (as done in this now out-of-date branch https://github.com/lkeegan/rp-bp/tree/dev_dependencies) is still missing. This was paused as the CI failed on macos but hopefully this has been fixed by the move to cmdstanpy - I can update this branch and make a PR in Jan.
@lkeegan I think this is sorted, right?
The dev_dependencies branch tries to:
Linux tests pass on CI, but macos segfaults on one of the calls to STAR: https://github.com/lkeegan/rp-bp/actions/runs/2902072541
Could this be related to this earlier warning?:
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=15072434, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 10