UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP
http://cath-tools.readthedocs.io
GNU General Public License v3.0
57 stars 14 forks source link

Add a 'install' target #66

Closed anadon closed 6 years ago

anadon commented 6 years ago

This is a feature/enhancement request to facilitate easier packaging on linux. Having the 'make install' and 'ninja install' targets available would be very useful. Rather that just building and copying executables, it allows distribution maintainers to more easily integrate these tools into their system in a robust way. Along with that, for the built libraries, adding version suffixes really helps a lot. Something like libct_biocore.a.1, libct_biocore.a.1.0, and libct_biocore.a.1.0.2.

I should have an example package up soon that I'll link to.

anadon commented 6 years ago

Something funny is going on with GSL. My guess is that the following executables are being compiled in with symbols for GSL when they shouldn't be. Probably should be stripped.

[anadon@doge cath-tools]$ namcap cath-tool-git-1-1-x86_64.pkg.tar.xz 
cath-tool-git W: Unused shared library '/usr/lib/libgslcblas.so.0' by file ('usr/bin/cath-score-align')
cath-tool-git W: Unused shared library '/usr/lib/libgslcblas.so.0' by file ('usr/bin/cath-ssap')
cath-tool-git W: Unused shared library '/usr/lib/libgslcblas.so.0' by file ('usr/bin/cath-superpose')
cath-tool-git W: Unused shared library '/usr/lib/libgslcblas.so.0' by file ('usr/bin/cath-refine-align')
anadon commented 6 years ago

https://aur.archlinux.org/packages/cath-tools-git/

tonyelewis commented 6 years ago

Thanks. I aim to look at this in the future.

Thanks for your engagement with cath-tools. I'm intrigued; please may I ask you what you scenario is that's motivating you to get this involved in the building/packaging of cath-tools?

anadon commented 6 years ago

I need it for getting interpro-scan working. I intend to do so in a well documented, professional, and technically excellent manner. As a part of that, if there is a problem or rough patch as I'm going through this, I am notifying developers and doing minor development work myself to allow for easier reproduction and to offer the kind of feedback I as a developer would appreciate. The packaging of this is simply best practice, helps out others, finds bugs, and improves reproducability.

Long term, I actually want to merge all bioinformatics libraries into a single one stop, excellent performing, verified, and effective library. Getting some more project experience by doing this sort of FLOSS helping out helps me get good enough for this goal.

tonyelewis commented 6 years ago

@anadon - that's all very admirable. Thanks for your contributions.

tonyelewis commented 6 years ago

Do the changes in 09f4a6947a788ddb823422d11689cdf4fc99d3ef suffice for the install targets?

I've thought about versioning the shared libraries and I currently think I'm fairly strongly against it unless you can persuade me otherwise. The reason: I think versioning the libraries is a strong signal to the world that they provide a supported interface (eg with ABI compatibility within major releases). But they're really not a supported interface, they're just a convenience as part of building the executables. And unless there's a very good reason, I don't want to have to worry about supporting such an interface and I don't want to have to care about ABI compatibility.

anadon commented 6 years ago

That touches on a very old philosophical debate among maintainers.

I've applied for Google Summer of Code to combine bioinformatics libraries and utilities into Boost. I'm also at the point where I'm looking for labs to do my PhD with. And I also have a library I'm trying to develop to be a common bioinformatics library at github.com/anadon/madlib . This may be an opportunity for me to do something with this.

What I would suggest, and this is a larger long term thing, is to split out the stable, foundational parts of Cath-tools from the parts for the specific tools. If someone wants something super specific, they can use and keep up with the rapidly changing, not generic code. If someone wants to general kind of interface to do basic things to be able to manipulate results, read files, or whatnot then they can use the stable library. The rapidly changing part don't bother versioning -- the Linux kernel doesn't really version like that either. But the stable parts should be, and the overhead for that should be very manageable. In fact, I'd like to add those components to MADLIB, once I get the time.

With that longer term goal in mind, what you have now should work well enough. Is this all fair and reasonable?

tonyelewis commented 6 years ago

Thanks for your comment. Closing this for now.

I admire your enthusiasm and I wish you well with your GSOC application.

TBH, a major factor in all this is that I don't expect to be able to devote much time to cath-tools in the future. I hope to keep it compiling and mostly-bug-fixed but probably little more than that.

anadon commented 6 years ago

I've gone back to repackage, and 'ninja install' isn't working from the top level directory. This likely needs to be reopened.

tonyelewis commented 6 years ago

I've managed to install into a directory (I specified with -DCMAKE_INSTALL_PREFIX. I get:

bin
├── cath-assign-domains
├── cath-cluster
├── cath-extract-pdb
├── cath-map-clusters
├── cath-refine-align
├── cath-resolve-hits
├── cath-score-align
├── cath-ssap
├── cath-superpose
├── check-pdb
└── snap-judgement

Please can you give more info on the details of the problem you're seeing? Thanks.

anadon commented 6 years ago

I didn't see that option. Let me get this tested tomorrow. Please humor me. Over the weekend I needed to get 5.25 TB to fit in 2TB of disk. I'm a little bit burned out.