cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
39 stars 14 forks source link

Meson switch #45

Closed ozagordi closed 6 years ago

ozagordi commented 6 years ago

Hi there. I propose these changes as the first (major) step towards shorah2.

In essence:

Quick way to install it

mkdir build && meson build
cd build && ninja install && cd ..
pip install -e .  # install with pip in development mode

Uninstall with

rm -r build && pip uninstall .

Tagging @DrYak as I would like to have his take as well.

SoapZA commented 6 years ago

@ozagordi in general this approach is sub-optimal, as you need to juggle setuptools and meson, which is harder than just ./configure && make install. I know people tend to hate bootstrapping the Autotools, but IMO this solution is much harder to get right (add to that, that people are generally hostile towards Meson's hard-requirement on ninja).

Here's are my suggestions:

  1. Let's keep the Autotools buildsystem for the time being, and remove the perl scripts and stuff that you've done here. I think the current Autotools buildsystem is rather simple and requires no ninja or meson. I don't see it as a major burden and it makes debugging a lot simpler.
  2. We port the pkg-config and POPCNT bits to setuptools. While setuptools is the mainstay of python packaging, I personally still think it sucks, because its fails horribly at determining external-nonPy dependencies (say, via pkg-config). My suggestion would be to turn all the executables into C Python shared modules, and the frontend then calls into those modules (which requires a bit of light refactoring). The current python-scripts-call-executables idiom is against the spirit of modern python packaging a la wheels anyways. This is probably the best approach from a UX point of view.
ozagordi commented 6 years ago

I found motivation to switch to meson in the exchange you and @armintoepfer had with Heng Li. I always found Autotools extremely intimidating, while I managed to have an idea of meson in a couple of days (never used before last week). With these refactoring I also intended to have, generally speaking, less files to manage.

  1. Would meson and bioconda be compatible? If yes, then I wouldn't worry about people's resistance or the possibility to get it wrong. Normal users would still install it with bioconda, more experienced ones wouldn't have a problem installing meson + ninja.

  2. setuptools is a pain, I agree. I finally got around it, at least partially, so it now does what I want: but only for python packaging. I have no idea if and how it can be used to compile C executables, so I don't really understand how the popcnt bit can be ported there. I also like the idea of porting the executables into shared objects, but I would postpone this to after having settled down on the interface.

SoapZA commented 6 years ago

meson is in conda-forge now (maintained by me and @dirmeier) and should be fine in bioconda (although meson still has a nasty bug on macOS due to Apple's retarded way in dealing with RPATH, but this should be fixed soon). Let me edit the branch a bit.

SoapZA commented 6 years ago

@ozagordi have a look now, I think it's cleaned up a bit now, and simpler while at it.

SoapZA commented 6 years ago

@ozagordi also please advise which python files need to be installed as "modules" into site-packages and which are just "scripts". You've refactored/renamed the python files a bit and I've lost track. Ideally, I would like to be able to use meson for doing a 'simple' python package installation (preferably using #!/usr/bin/env python3 in the shebangs, which meson wont replace, but setuptools will), and for advanced use cases setuptools can be used.

ozagordi commented 6 years ago

Makes a lot of sense, thanks.

I'm not sure I would delete the install-script, though. I was trying to avoid the global installation because everything (including the python code) ends up in /usr/local/bin/, and then I don't have a shorah executable in the command line. With setuptools I usually define a module (in this case usr/shorah/ in the project main folder) and let pip install it. In this way I have it as a module, so that stuff like

python -c "import shorah"
python -c "from shorah import shotgun"

works. At the same time I also have it as command line script that I can run like

shorah -h

since setup.py defines the entry point

entry_points={
    'console_scripts': ['shorah = shorah.cli:main']
}

If we keep the meson installation alone, then for sure cli.py and __main__.py cannot end up in /usr/local/bin. One option would be to rename cli.py to something like shorah. I don't know if I like it, but I can certainly live with it. Then, I would have to rewrite the import part as it does not work now. In the end we wouldn't have it available as a module, which is a minor issue, I agree.

How do you see it?

DrYak commented 6 years ago

Hello all !

Regarding the building system :

SoapZA commented 6 years ago

My goal here is to make the default experience as pleasant as possible (as much as that is doable with python packaging really). The reason I don't like the install script is because it violates a holy aspect of meson: never touch the source tree. What if I want to have two build trees, to test different flags?

Here's another idea: instead of shoehorning meson to fit setuptools, do the opposite. Search for __main__.py in setup.py, which will tell you the directory and all information you need to know about the built tree. You can then paste these things together and have setuptools install them. This shifts the complexity from meson into setuptools, which is fine, because setuptools is already broken by design. How does that sound? It also has the advantage of making less assumptions about directory layouts and such.

ozagordi commented 6 years ago

So, if I understand correctly,

  1. meson builds all the CPP executables and installs them into /usr/local/bin,
  2. it configures the python code and leaves it into its build directory (meaning, the install_data bit should be deleted),
  3. setup.py searches the project subfolders for __main__.py, copies all python code in the same directory into src/shorah and proceeds with setuptools standard business,
  4. we need to adapt python code to look for b2w, diri_sampler and fil in PATH rather than in the module, at least until we switch to shared objects.

Sounds like a good option to me.

SoapZA commented 6 years ago

mostly the idea, yes. You can likely add some hacks to try and find b2w without using PATH, except that in general you dont need to run ninja install.

ozagordi commented 6 years ago

Yes, setup.py might also find the executables and move them to some place like src/shorah/bin. In this way ninja alone plus some pip would be enough. No need to ninja install.

DrYak commented 6 years ago

Okay. Managed to catch up the thread.

Regarding the bioconda packaging : as long as meson is available as a conda package (thank you guys), it's still possible to make bioconda packages. (After all, it basically boils down to a shell script that is allowed to call whatever it takes to compile a package. I'll just put your commands in it)

I agree with @SoapZA about not modifying the source tree (which is always important if you want to compile multiple targets, each in different sub-directories : optimized, debug, special different arch, etc.)


Speaking about an upcoming Shorah2 :

Are you planning also to upgrade the libbam to a newer version? @sposadac suspected that the bug that lead to zero counts appearing in the first place might be caused by an older buggy version.


Speaking of what regular users are expecting, I'm toying around with docker containers as an additional alternative solution (They solve a few "reproductible/freezable deployment" cases, are automatically pushed on quay.io by bioconda, and recently with efforts such as Singularity are possible to use in a HPC-friendly manner, all of which is supported by recent versions of snakemake (and conda)) We'll be discussing about it with Prof. Beerenwinkel next week.

DrYak commented 6 years ago

Tested building with meson, but libpthread is missing, here's the fix : pthreads_missing.patch.gz

DrYak commented 6 years ago

But I'll need some help to get how the python part is supposed to get installed into an arbitrary path.

ozagordi commented 6 years ago

I can't speak for libbam, I'm definitely not the expert in this field.

I will edit setup.py to work as we discussed in the next few days.

One more thing: integration tests. Do we stick to Travis? I would let you take care of this. In the meanwhile, I would disable testing in the Travis dashboard to save resources: it will keep failing until we modify it.

DrYak commented 6 years ago

Regarding Continuous Integration : Sadly, I lack enough first hand experience, my previous employer was allergic to this kind of concepts. (Getting them to start using Git was already an achievement).

My current limited experience is from the exposure to what is used by bioconda, and the couple of other packages used in V-pipe : so mostly Travis and CircleCI.

So I cannot give a well informed enough recommendation about sticking with Travis / Switching to something else. (On CircleCI, Mac OS X instances seem to be paying, but there should be a way to negociate for opensource projects like this one).

I'll try to run tests once you finish the mentionned setup.py modifications.

armintoepfer commented 6 years ago

Sorry, but if you have nothing to say about CI, then don't say anything.

armintoepfer commented 6 years ago

Regarding Shorah 2.0, why don't you get rid of all the python noise and do it in C++, ship a single static binary, e voila, it's actually deployable.

ozagordi commented 6 years ago

Thanks @DrYak, your answer was indeed useful. That is pretty much my exposure as well, and I had already forgotten that CircleCI is not free for MacOS instances; thanks for reminding it.

@armintoepfer I really don't have the motivation to replace all the python code with C++. Above all, I would need to refresh/recreate my already limited C++ knowledge, and I'd rather do other things. In terms of deployability, I also don't see a strong case for it. As long as we can have it in bioconda, it will always be easier for the users to install it that way.

SoapZA commented 6 years ago

@ozagordi slowly getting there. can we restore the Autotools stuff (which is so much easier for bioconda, as its literally just ./configure --prefix="${PREFIX}" PYTHON="${PYTHON}" && make install)?

You can use meson for your personal development workflow, it's not like shorah has a flux of a million files every week, so I don't see a lot of extra effort.

ozagordi commented 6 years ago

So we keep two build systems in place? If you feel safe about it, fine. But I'm not able to do it; the structure has changed a bit.

SoapZA commented 6 years ago

Just restore the Autotools files - I'll fix them for the new structure

SoapZA commented 6 years ago

@ozagordi Travis passes again - please don't merge it until I give the green light, I need to add git-sha1 hooks to Automake in order to make the version querying slightly more sane and less hardcoded.

DrYak commented 6 years ago

I was going to ask if you need any help for autotooling, but apparently you were super fast.

"git describe" is way to do the git version querying in a programmatic way.

ozagordi commented 6 years ago

So, python code doesn't need to be configured with meson anymore? If that's the case, let's remove it from src/python and put it into src/shorah for good.

SoapZA commented 6 years ago

@DrYak Thanks, I haven't ever used git, but please feel free to add the gnulib git-version-gen module then.

SoapZA commented 6 years ago

@ozagordi setuptools replaces shebangs itself when installing scripts, the !#/usr/bin/env @PYTHON@ was just there for Autotools. Seeing that most of the py files are now modules and not executables anymore, you generally don't need shebangs in module files (I just left them there for good measure and testing purposes).

DrYak commented 6 years ago

Here you are. Patch also as an attachement

This makes the PACKAGE_VERSION in config.ac (and VERSION in Makefile) automatically track the .git commit (or fallback to a .tarball-version when deployed from tarball instead of git).

Now the remaining question :

ozagordi commented 6 years ago

With the possibility that I'm giving a wrong answer to the wrong question: I use this method in MinVar.

You need these hidden files:

Then setup.py needs to have

setup(
    use_scm_version=True,
    setup_requires=['setuptools_scm', 'setuptools_scm_git_archive'],
    install_requires=['setuptools_scm'],
    ...

while cli.py needs something like

from pkg_resources import (get_distribution, DistributionNotFound)

try:
    __version__ = get_distribution('minvar').version
except DistributionNotFound:
    __version__ = 'unknown'
...

parser.add_argument('-v', '--version', action='version', version=__version__)

This should be enough.

SoapZA commented 6 years ago

@ozagordi the idea is to continue using the Autotools build system in bioconda, mainly for reasons of portability, speed and simplicity. For that, we will install the version file alongside into the site-packages dir, and load it in the __version__ = 'unknown' line.

Using the meson+setuptools build system imo is too unwieldy, clunky and pulls in too many dependencies. You can use it for development though.

ozagordi commented 6 years ago

I would also like someone to give a look at the interface; command line options and the like. Does it seem reasonable to you?

With this PR we also addressed the modernisation of C/C++ issue. There are a lot of warnings now, but they all come from samtools code.

SoapZA commented 6 years ago

@DrYak can you please look into obliterating the bundled samtools and just using the latest htslib? The bundled samtools is a massive PITA and bundling is considered bad practice for bioconda anyhow.

DrYak commented 6 years ago

@SoapZA will have a look at it. Hopefully this will fix the problems that @sposadac is suspecting the old version of samtools to cause.

SoapZA commented 6 years ago

@ozagordi rebased, cleaned up and ready from my side now. The htslib-related changes can come at a later point

SoapZA commented 6 years ago

@ozagordi you created a merge commit by merging into your local branch, please always rebase respectively git reset --hard on the remote branches. Merge commits make git bisect unnecessarily hard and make reasoning about code changes non-understandable. Always enable it: git config --global pull.rebase true.

ozagordi commented 6 years ago

I wanted to rebase and merge, but after the last commits it wasn’t given anymore as an option by GitHub. Rather, it gave a list of commands to run locally (that’s what I did). Didn’t know of resetting.