cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
41 stars 14 forks source link

With htslib #52

Closed DrYak closed 6 years ago

DrYak commented 6 years ago

Fixing issue #47

Branch actually NOT finished :

DrYak commented 6 years ago

Well, basically, I was replicating the way samtools build :

But if you don't like that approach I can convert it to PKG_CHECK_MODULES

SoapZA commented 6 years ago

Yes change it please. The htslib people write awful build systems, and using m4 instead of simple pkg-config is awful (furthermore, only pkg-config works in Meson). PKG_CHECK_MODULES is roughly 100x easier to use and to debug than m4 crappiness. Anyhow, given that Meson 0.46 now has a full-fledged python module, the Autotools support will likely vanish before the final 2.0 release.

dirmeier commented 6 years ago

David, according to you the whole world writes awful build systems :D. You should write a tutorial: "Build systems for C++, the definite guide."

ozagordi commented 6 years ago

Hi guys. Many thanks for working on this. I'm just observing from a distance because I'm not able to give some constructive feedback here.

SoapZA commented 6 years ago

@dirmeier that is technically correct, but if you use a non Turing-complete build system with proper across-language integration designed as a first-class feature, all of the pain goes away.

ozagordi commented 6 years ago

@SoapZA This reminds me that

git gets easier once you get the basic idea that branches are homeomorphic endofunctors mapping submanifolds of a Hilbert space."

SoapZA commented 6 years ago

image

DrYak commented 6 years ago

Okay added the requested changes (and cleaned up slightly the commit log)

Also : got the data files from Susie, I'll see if the bugs (negative counts) still appear with HTSlib

DrYak commented 6 years ago

Small update: the data files that Susie gave me didn't generate any negative count.

Yippee !!!

Yippee!!!

But it's still generating zero counts (although the previous steps does list SNVs, fil - the strand bias step - finds 0 of them).

That might require some further error checking.

DrYak commented 6 years ago

Hello, small message to ask if you did manage to find time for having a look at the requested changes I pushed 2 weeks ago ? Do they suite you?

(Or good luck with the work if you're over-whelmed with something else)

SoapZA commented 6 years ago

@DrYak fixed meson, should be working now. Could you please take it for a run with the test data you have (which we should be adding to Travis + something like cram tests) please?

Also, I removed the C99isms like the VLAs. VLAs are one of those bad ideas of the C landscape that should be obliterated with fire.

ozagordi commented 6 years ago

I'm trying to install, but meson builddir gives me

Native dependency htslib found: NO found '1.3.1' but need: '>=1.7'
Dependency htslib found: NO

src/cpp/meson.build:17:0: ERROR:  Dependency 'htslib' not found, tried Extra Frameworks and Pkg-Config:

Invalid version of dependency, need 'htslib' ['>=1.7'] found '1.3.1'.

I'm working in a conda environment where I manually installed htslib. No idea where 1.3.1 comes from, to be honest.

DrYak commented 6 years ago

100% agree regarding VLAs. That was on my "should be eventually done some days" list. (Together with replacing the maps from char to enum, see comment above. Probably some left-over python-ism from older pure python proof of concept that got rewritten into C++ ?)

Eventually, as part of another branch, I should move the 'gathering/stats' part of outside of the pileup_func call back function in fil.cpp (Should go after the bam_plp_reset of main()).

Will do tests as soon as I've got some free time (SIB Days 2018 are coming now), and will come back in touch with you. Thanks for the meson help.

SoapZA commented 6 years ago

@ozagordi sounds line an ancient bioconda setup. Nowadays bioconda only uses htslib 1.7 (for long CIGAR support on PacBio and Oxford Nanopore data). We can drop it to 1.4, but 1.3.1 is definitely too low (1.3 and below have undefined behaviour issues and also break the ABI between 1.3 and 1.4).

SoapZA commented 6 years ago

@ozagordi also, remember, you need to set up PKG_CONFIG_LIBDIR to point to the directory in the bioconda prefix where htslib.pc (and likely zlib.pc) are located, otherwise Meson will find the system Htslib.

ozagordi commented 6 years ago

Thanks, that worked. Now, what am I doing wrong?

dyld: Library not loaded: @rpath/libhts.2.dylib
  Referenced from: /private/tmp/shorah/src/shorah/bin/b2w
  Reason: image not found

checking with otool

otool -L /private/tmp/shorah/src/shorah/bin/b2w
/private/tmp/shorah/src/shorah/bin/b2w:
    @rpath/libhts.2.dylib (compatibility version 2.0.0, current version 1.8.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
SoapZA commented 6 years ago

export DYLD_LIBRARY_PATH=/directory/in/bioconda/prefix/where/libhts.2.dylib/is/located

ozagordi commented 6 years ago

Wow. Just wow.

ImportError: dlopen(/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /Users/ozagordi/miniconda3/envs/shorah/lib/libJPEG.dylib
 in /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO

(to clarify, this is the error thrown when running with a different DYLD_LIBRARY_PATH)

SoapZA commented 6 years ago

@ozagordi have you sourced the conda env?

ozagordi commented 6 years ago

Not working today, but yes. I remember I had activated the env. ( recent conda version requires conda activate envname)

ozagordi commented 6 years ago

So, compiling and running the new version on OSX is proving much more difficult than expected. Now I have, after changing DYLD_LIBRARY_PATH

Traceback (most recent call last):
  File "/Users/ozagordi/miniconda3/envs/shorah/bin/shorah", line 11, in <module>
    load_entry_point('ShoRAH', 'console_scripts', 'shorah')()
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 150, in main
    args.func(args)
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 75, in amplicon_run
    from shorah import amplicon
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/amplicon.py", line 42, in <module>
    from Bio import SeqIO
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 326, in <module>
    from Bio._py3k import basestring
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/site-packages/Bio/_py3k/__init__.py", line 116, in <module>
    from urllib.request import urlopen, Request, urlretrieve, urlparse, urlcleanup
  File "/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/urllib/request.py", line 2585, in <module>
    from _scproxy import _get_proxy_settings, _get_proxies
ImportError: dlopen(/Users/ozagordi/miniconda3/envs/shorah/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Library not loaded: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
  Referenced from: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib
  Reason: Incompatible library version: vecLib requires version 1.0.0 or later, but libLAPACK.dylib provides version 0.0.0

I'm not sure how to proceed. Is this just my problem? Just a meson problem? How does it impact the overall portability? I admit that I was trying many different things and I kind of lost track of the changes I made.

DrYak commented 6 years ago

Update :

DrYak commented 6 years ago

Hello guys,

sorry for the slow news, the heat wave managed to render my last few brain-cells inoperative, and I haven't been as efficient as I would have wanted.

Here's my latest push with the cleaned version of the last edits.

Speaking of brain not working at 100%, I missed a few things when I separated counts and statistics : actually after cleaning up it turns out it DOES solve the 'zero counts' bug, in all but a few rare corner case. (The corner case is due to how B2W and FIL behave around indels, see comments in code).

Unless @SoapZA has some cosmetic suggestions, I think this branch is done.

(The B2W and FIL case should be handled separately)

DrYak commented 6 years ago

Thanks for the fixes ! :+1: