Closed SoapZA closed 7 years ago
HI @SoapZA, I'm very much in favour of a tidier directory structure. The project started almost ten years ago now, should I start it now, I would obviously do things differently. Over the years a few inelegant hacks accumulated, so it's good that you gave a closer look at those, thanks. A few things that are now crossing my mind, in no particular order.
I am a conda user and I would like to include instructions for this packaging system as well. Creating the environment is extremely easy conda create --name shorah python=2.7 Biopython
, but then I don't know how PATH
and PYTHONPATH
are dealt with in configure
and so on. I'm not familiar with autotools.
At this point, I would move everything to Python 3 (again, many years have passed...). This is easy.
I have considered multiple times to ditch mm.py
and freqEst
altogether. People tend to use these in extreme cases and they fail miserably. Further, there are better tools now for global reconstruction. At this point one would have shorah
as command and two subcommands: amplicon
and shotgun
. In case global reconstruction stays, a third subcommand would be global
, with a big warning flag. You have probably been a shorah user before being a developer: what do you think? (I have implemented the command-subcommand logic in VirMet)
EDIT:
(mentioning @sposadac to keep everybody in the loop)
@ozagordi thanks for the ideas
mm.py
and freqEst
are somewhat obscure and only make sense in a global setting. Again, I am totally š for this, but think it should be done as part of a follow up PR.Concerning setuptools: for a pure python package, setuptools is literally the only option (think out-of-source building, PyPI, uninstalling, generating wheels packages, etc). One massive problem of setuptools (and pretty much all programming language-specific build systems, be that perl/ruby/rust/go/etc) is that they fail badly when breaking out of the pure language ecosystem and interacting with other languages (mostly C/C++/Fortran). Case in point: setuptools is barely good enough for building binary python modules (shared objects), let alone building standalone executables (like in the case of ShoRAH). It cannot query C dependencies well (think zlib/samtools/ncurses), it doesn't support pkg-config without awful hacks. You can see this in the build systems of pysam and numpy, which are monsters that are full of hacks in order to fix setuptools' external C dependency discovering weaknesses.
Yes, I'm not going to deny that the Autotools are complex and deep. They involve an ancient macro processor (m4
) and can produce obscure errors. Still, they are the mainstay of the free software ecosystem, and they allow for changing paths/compilers/CXXFLAGS etc, which makes customisation/installation so much easier. Furthermore, I think the Autotools are the only option really, as they can handle the C dependencies like a breeze (native support for pkg-config), yet they can also handle installing and byte-compiling python modules. Neither CMake, Scons or Waf can do that, plus the fact that everyone knows how to use ./configure --prefix=blabla
is kinda a win in my opinion.
Here's my suggestion:
Does that sound like a plan? After the merge, @sposadac wants to merge her POPCNT
performance improvements that speed up the DPM massively.
@ozagordi thanks for transferring, is it ok for me to merge now?
Hi @ozagordi We made some fixes to the build system, which now uses Autoconf and Automake to install everything and the python modules. This is more robust than the old Makefile, which couldn't install the sources. Main features:
CC
/CXX
/CXXFLAGS
etc for users to adjust how aggressive optimisations are to be performed.configure
script replaces the python shebang, which is better as it doesn't require as to constantly hackPATH
s andPYTHONPATH
s so the correct interpreter is used.shorah_
to make them less likely to collide with builtin modules.README.md
has been simplified and has detailed instructions on how to install ShoRAH properly now.VPATH
-compliant, meaning it supports out-of-source builds natively, which is how modern build systems work.PATH
-based lookups using/usr/bin/env
which is better for users of MacPorts/Homebrew and friends (first law of computer science: All problems in computer science can be solved by another level of indirection).