dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

AVX Instructions and jemalloc #250

Open brand-fabian opened 3 years ago

brand-fabian commented 3 years ago

Hi there,

thanks for providing this great tool. Unfortunately I have run into issues executing the default Docker container on (very) old hardware (march=(amdfam10|westmere)), on those I get Illegal Instruction (core dumped) errors. While the tool runs fine on contemporary hardware, given that we still run a sizeable amount of these legacy systems I would like to get the tool to work on these systems as well.

To do so, I create a singularity image following the same steps from the Dockerfile provided in the repository, with some changes to the CMakeLists.txt:

# Modify march and mtune to set SIMD Preferences
MARCH=$(gcc -march=native -Q --help=target | grep march | cut -f3)
sed -ie "s/ivybridge/$MARCH/g" CMakeLists.txt
sed -ie "s/-msse4.2//g" CMakeLists.txt
sed -ie "s/-DHAVE_SSE42//g" CMakeLists.txt
sed -ie "s#USE_SSE=1#UKNOWN_VARIABLE=1#g" CMakeLists.txt

This builds fine, however when running the resulting executable or invoking ctest -v it throws the false-positive jemalloc warning error of test/jemalloc_linkig.sh. This is a result of the executable not being able to detect jemalloc support (i think). However jemalloc is installed and linked (LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2).

Are there some further steps I need to do to enable jemalloc support for the executable? Is it possible that the glnexus_cli will issue the warning despite jemalloc being loaded and subsequently used? Is there any way for me to tell whether jemalloc is actually used by glnexus_cli?

mlin commented 3 years ago

Here's where the warning comes from. mallctl() is a jemalloc routine. A few things can plausibly go wrong, which is what that test script tries to detect.

https://github.com/dnanexus-rnd/GLnexus/blob/0e74fc40ec4b6773fa9bf9290333c6ccecd6d855/src/cli_utils.cc#L22-L41

Is the software environment quite old too? Our CI compiles & tests in Ubuntu 18.04 so it's going to get dicey (untested by us, at least) with OS older than that. Otherwise I can't think of a reason why changing the target CPU architecture, specifically, would interfere with the above mechanism.

If this is the last issue you have then you can probably disable that test -- even if jemalloc is truly absent, it would make GLnexus much slower when using tens of threads, but it should still work correctly and I presume you're not going for speed on the older hardware!

brand-fabian commented 3 years ago

Iam executing the software through singularity. The singularity image is a carbon copy of the Dockerfile in this repository except for the 4 lines I pasted in the original comment. Currently the machines themselves are running a reasonably modern kernel (uname -r -> 5.3.18-lp152.47-default).

Is there any reference on how long the benchmark dataset included in the tests is supposed to take? Or is it possible to somehow tell after running the software whether the actual malloc in use was jemalloc instead of e.g. libc's one?

mlin commented 3 years ago

There are a few factors here outside our design parameters -- would you be able to try isolating them (for example, try the Singularity container on a modern host or a Docker container on the old host)? The toolchain version used to compile the software is another one.

The warning in question was our best attempt to precisely indicate whether jemalloc is being used, then this test script on top to make sure! Here's a thread where some other ideas were discussed. jemalloc is only critical at high thread counts (>>10) and I doubt a difference would be visible on the small test dataset. The glibc allocator is fine on smaller machines & I also wouldn't be surprised if its multithreaded performance improved significantly in the years since we did find a dramatic difference there.

brand-fabian commented 3 years ago

Thanks for pointing me towards this thread. I have run the glnexus_cli for a small subset of the WGS samples with MALLOC_CONF=stats_print:true and from what I can see (although I happily admit I dont understand the output fully), jemalloc seems to be working. The log file of the run is attached.

It seems then that this is just an issue with the detection routine and should not negatively impact the execution of this tool.

glnexus.test.log