cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
41 stars 14 forks source link

Fix build system and other fixes #25

Closed SoapZA closed 7 years ago

SoapZA commented 7 years ago

Hi @ozagordi We made some fixes to the build system, which now uses Autoconf and Automake to install everything and the python modules. This is more robust than the old Makefile, which couldn't install the sources. Main features:

  1. The Automake-based Makefile is fully parallelised, which allows the whole thing to build faster. The autotools also allow you to override all variables like CC/CXX/CXXFLAGS etc for users to adjust how aggressive optimisations are to be performed.
  2. The configure script replaces the python shebang, which is better as it doesn't require as to constantly hack PATHs and PYTHONPATHs so the correct interpreter is used.
  3. The python scripts have been split up into a module part and a script part. The module part contains the actual gist (99% of the old scripts), whereas the scripts now just act as small wrappers around the modules. Furthermore, all modules have been namespaced with shorah_ to make them less likely to collide with builtin modules.
  4. The general directory structure has been cleaned up, such that less files are in the root of the git repo.
  5. The README.md has been simplified and has detailed instructions on how to install ShoRAH properly now.
  6. The build system is perfectly VPATH-compliant, meaning it supports out-of-source builds natively, which is how modern build systems work.
  7. The perl shebangs have been made my relying on PATH-based lookups using /usr/bin/env which is better for users of MacPorts/Homebrew and friends (first law of computer science: All problems in computer science can be solved by another level of indirection).
ozagordi commented 7 years ago

HI @SoapZA, I'm very much in favour of a tidier directory structure. The project started almost ten years ago now, should I start it now, I would obviously do things differently. Over the years a few inelegant hacks accumulated, so it's good that you gave a closer look at those, thanks. A few things that are now crossing my mind, in no particular order.

  1. I am a conda user and I would like to include instructions for this packaging system as well. Creating the environment is extremely easy conda create --name shorah python=2.7 Biopython, but then I don't know how PATH and PYTHONPATH are dealt with in configure and so on. I'm not familiar with autotools.

  2. At this point, I would move everything to Python 3 (again, many years have passed...). This is easy.

  3. I have considered multiple times to ditch mm.py and freqEst altogether. People tend to use these in extreme cases and they fail miserably. Further, there are better tools now for global reconstruction. At this point one would have shorah as command and two subcommands: amplicon and shotgun. In case global reconstruction stays, a third subcommand would be global, with a big warning flag. You have probably been a shorah user before being a developer: what do you think? (I have implemented the command-subcommand logic in VirMet)

EDIT:

  1. I believe we should also explore setuptools the de facto standard packaging method for python projects.

(mentioning @sposadac to keep everybody in the loop)

SoapZA commented 7 years ago

@ozagordi thanks for the ideas

  1. I have looked at recipes to turn ShoRAH into a bioconda package. Bioconda can seamlessly integrate with Autotools-based build systems, I'll do it once we've merged it.
  2. Moving everything to py3 would be a big šŸ‘ win. Continuing to use Python 2 is beating a dead horse, as the end-of-life of Python 2 is approaching. I suggest we first merge the PR, and then in a follow up fixup pull request move everything to python 3.
  3. Yes, mm.py and freqEst are somewhat obscure and only make sense in a global setting. Again, I am totally šŸ‘ for this, but think it should be done as part of a follow up PR.
  4. Concerning setuptools: for a pure python package, setuptools is literally the only option (think out-of-source building, PyPI, uninstalling, generating wheels packages, etc). One massive problem of setuptools (and pretty much all programming language-specific build systems, be that perl/ruby/rust/go/etc) is that they fail badly when breaking out of the pure language ecosystem and interacting with other languages (mostly C/C++/Fortran). Case in point: setuptools is barely good enough for building binary python modules (shared objects), let alone building standalone executables (like in the case of ShoRAH). It cannot query C dependencies well (think zlib/samtools/ncurses), it doesn't support pkg-config without awful hacks. You can see this in the build systems of pysam and numpy, which are monsters that are full of hacks in order to fix setuptools' external C dependency discovering weaknesses.

    Yes, I'm not going to deny that the Autotools are complex and deep. They involve an ancient macro processor (m4) and can produce obscure errors. Still, they are the mainstay of the free software ecosystem, and they allow for changing paths/compilers/CXXFLAGS etc, which makes customisation/installation so much easier. Furthermore, I think the Autotools are the only option really, as they can handle the C dependencies like a breeze (native support for pkg-config), yet they can also handle installing and byte-compiling python modules. Neither CMake, Scons or Waf can do that, plus the fact that everyone knows how to use ./configure --prefix=blabla is kinda a win in my opinion.

Here's my suggestion:

  1. You transfer the repo (you're a temporary owner, so that should work).
  2. I change the merge settings (now that Github supports rebase commits and not just merge commits anymore, I strongly prefer rebase commits for the linear git history) to use rebase merging.
  3. We merge the pull request.
  4. I bioconda'ise the package.

Does that sound like a plan? After the merge, @sposadac wants to merge her POPCNT performance improvements that speed up the DPM massively.

SoapZA commented 7 years ago

@ozagordi thanks for transferring, is it ok for me to merge now?