bedops / bedops

:microscope: BEDOPS: high-performance genomic feature operations
https://bedops.readthedocs.io/
Other
295 stars 59 forks source link

compiling bedops without in-tree copies of jansson, bzip2, zlib #124

Closed rekado closed 6 years ago

rekado commented 9 years ago

I'm trying to build bedops for GNU Guix, but the reliance on in-tree copies of patched tarballs of third-party libraries makes this more difficult than necessary.

Are the patches to these libraries publicly available so that I could patch bedops to work with system libraries?

sjneph commented 9 years ago

Hi, thanks for the feedback. BEDOPS only requires a modern compiler such as g++ 4.8.3 or better. Environmental setups with PATH and similar variables to system libraries that (mostly) duplicate 3rd party code that comes bundled with the source won't matter. It should be just make && make install. If that isn't the case, we'd like to hear the details. There are philosophical advantages to requiring folks download and configure their systems with these 3rd party requirements, and there are also several disadvantages.

rekado commented 9 years ago

I can build BEDOPS just fine in a regular environment, so it's not a bug but rather a feature request.

Functional package management (as done by GNU Guix or Nix) requires some minor patching of shebangs (for example in configure scripts), which is a little messier than usual when in-tree tarballs are unpacked during the build phase. I could patch the build process up so that it works with GNU Guix, but I'd much prefer if I could link against stock versions of the three libraries (as they already exist in the package store).

Would you accept a patch that added the option to link against out-of-tree versions of these libraries, defaulting to using the included tarballs?

If so, I would need to separate the patches you applied to the three libraries (for example, your version of zlib 1.2.7 apparently differs from upstream by a few defines), so that upstream versions could be used instead.

sjneph commented 9 years ago

I haven't used Guix as yet. If the code base doesn't work out of the box with a recent compiler, then we are certainly open to any improvements. We'd be most interested in a solution that includes both an in-tree patch and also the options for out-of-tree pointers as you suggest. Thank you for taking an interest in helping.

rekado commented 9 years ago

Do you happen to have the patches you applied to the tarballs available in plain text somewhere? I do not want to risk getting the wrong patches when diffing the tarballs with their upstream versions.

Attempting to unpack the bzip2 tarball, for example, I get this message:

$ tar xvf bzip2-1.0.6.tar.bz2
...
bzip2: (stdin): trailing garbage after EOF ignored
...

I get the same message for the other two tarballs. Are the tarballs corrupt?

sjneph commented 9 years ago

I don't have any issues with: tar -xjf

rekado commented 9 years ago

Curious.

[rekado@box:~/code/bedops/third-party] $ tar xjf jansson-2.6.tar.bz2 

bzip2: (stdin): trailing garbage after EOF ignored
[rekado@box:~/code/bedops/third-party] $ tar --version
tar (GNU tar) 1.26
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

I'll just assume the tarballs are fine and diff them against the upstream versions to extract the patches.

alexpreynolds commented 9 years ago

Possible issue with tarball, investigating: https://github.com/bedops/bedops/issues/125

alexpreynolds commented 9 years ago

This tarball is fixed in the v2p4p6 branch.

rekado commented 9 years ago

Having extracted the differences between the upstream tarballs and the included tarballs, it seems to me that the only important change is the addition of Darwin Makefiles for bzip2.

The other two libraries are patched only to avoid compiler warnings, it seems (e.g. by replacing C casts like (int)sizeof(z_stream) with C++ casts like static_cast<int>(sizeof(z_stream))). In doing so, the patch to zlib breaks compatibilty with the upstream version by introducing deflateInit2cpp. It seems to me that upstream's deflateInit2 would work just as well. Please correct me if I'm wrong (my C++ is rusty).

My plan is to perform the following changes:

We would gain more transparency (as upstream release tarballs for the given libraries could be used directly), and bedops could optionally be linked against system libraries (allowing for simpler reproducible builds for GNU Guix and Nix).

Does this sound acceptable?

alexpreynolds commented 9 years ago

Maybe we can offer this as an unsupported build procedure for people who use those platforms. No guarantees, but if you want to submit a pull request and we can review these changes to decide how to proceed, that would be great.

BEDOPS is made up of a mix of C++ and C code, compiled with C++ tools (with the current exception of convert2bed). This is pretty much my fault. The Starch components I wrote are in C and are dependent on zlib and libbzip2. The core BEDOPS tools that are written in C++ (bedops and bedmap, particularly) link to functionality in this C-based Starch library, which must be compiled with a C++ compiler to be linked in at the end.

The static_cast and other casts were added to prevent problems with C++ compilation. The deflateInit2cpp macro only gets compiled in if the customized zlib library is compiled with a C++ compiler. Going back to a stock version of zlib without casts may reintroduce these issues.

We also include the third-party dependencies with the codebase to avoid API version problems. For instance, a fairly recent version of Jansson (before the revision we include, specifically) does not offer the ability to parse Boolean JSON entities. Before then, I had to write a small hack to deal with that missing piece. Packaging the specific version with a known API that we need fixes these sorts of problems before they become problems for other people.

While I grant that it's not the typical way of doing things, I think we avoid a lot of support issues where the end user has to find a specific version of a dependency before compiling. As an end user, I hate having to track down specific versions of things. It's a waste of time and my goal is to get these useful tools into as many people's hands with as few install headaches as possible.

Sorry for the long message; just trying to roughly explain my thought process. Maybe it is the wrong way in the long run. If you'd like, please submit patches and let's take a look.

rekado commented 9 years ago

Maybe we can offer this as an unsupported build procedure for people who use those platforms. No guarantees, but if you want to submit a pull request and we can review these changes to decide how to proceed, that would be great.

This is all I was hoping for :) I'll prepare a pull request some time this week maybe.

The static_cast and other casts were added to prevent problems with C++ compilation. The deflateInit2cpp macro only gets compiled in if the customized zlib library is compiled with a C++ compiler. Going back to a stock version of zlib without casts may reintroduce these issues.

I'll try to do compile with stock versions and see if I can reproduce these issues.

While I grant that it's not the typical way of doing things, I think we avoid a lot of support issues where the end user has to find a specific version of a dependency before compiling.

I very much understand your reasons for doing this and I should add that building BEDOPS on Fedora was a snap. I could package it as is for Guix, but I consider patched tarballs of upstream libraries a wart. (In Guix one doesn't need to worry about broken dependencies when upgrading as the package store and the dependency graph is immutable; so the problem of finding specific versions for BEDOPS completely disappears from a user's point of view.)

Thank you for being open to consider patches in this area!

rekado commented 9 years ago

Here's a first pull request to patch the library sources at build time rather than using patched tarballs: https://github.com/bedops/bedops/pull/128

It does not result in any functional difference. Makefile.darwin would still have to be modified to also apply the in-tree patches before building the third-party libraries.