FATAL: J state unsupported

GabeAl commented 3 years ago

I've compiled v3.3.2 of the source code with native compiler flags for the Zen v2 architecture. I've also added -ffast-math.

I get this occasionally while running: FATAL: J state unsupported Fatal exception (source file p7_trace.c, line 163): realloc for size 0 failed sh: line 1: 3642453 Aborted (core dumped) hmmalign --outformat Pfam /tmpus/b26dfdff-eb39-47f3-8950-3ae0f3af697c tempg_2/out/storage/aai_qa/SGB-03449/PF03710.10.unaligned.faa > tempg_2/out/storage/aai_qa/SGB-03449/PF03710.10.aligned.faa

As well as a bunch of other "FATAL: J state unsupported" sprinkled throughout.

What is a J state and why is it failing? Code that prints this: src/tracealign.c: case p7T_J: p7_Die("J state unsupported");

Other info: 256 thread CPU system 2TB RAM

Running in the prokka pipeline on rep genomes from the SGB (Pasolli et al).

cryptogenomicon commented 3 years ago

HMMER depends on IEEE754-compliant floating point arithmetic, and --ffast-math allows the compiler to make unsafe and noncompliant optimizations. Why did you use --ffast-math, what happens if you just compile the code normally, and what is the result of make check with and without your custom compiler options?

GabeAl commented 3 years ago

Thanks! This is a helpful explanation. I'd used it because it afforded a small 1-2% speedup (I'm trying to squeeze as much performance as I can out of it, since it is a core component of a few QC and annotation pipelines I'm kicking the tires on).

Indeed once -ffast-math is removed, the error disappears. Interestingly, all other combinations of compiler optimization flags I've tried, including code profiling options, even using the intel compiler, all seemed to proceed without this issue. In practice I've rarely come across a case that so intimately depends on IEEE754, but they exist, clearly (compiling glibc is another prominent example that actually stops in its tracks without compiling if -ffast-math is detected)! Feel free to close this.

The only other interesting problem I've had is if hmmer is using more than 128 threads on a system, split across hmmer instances. I get strange non-deterministic segfaults even in the standard conda compile whenever using > 128 threads in total (across concurrent hmmsearch runs, not within a single instance of hmmsearch). But since I can't fathom what threading model could possibly lead to this behavior (signed char as # threads? Or uint8_t but not accounting for the I/O thread?), and because I haven't seen other reports of this using the standard conda compile, I am hesitant to report the behavior without a clear confirmation that it is indeed an issue with hmmer and not some other aspect of the pipeline/stack.

npcarter commented 3 years ago

One possibility for why you’re seeing unpredictable segfaults when running many threads is that the amount of memory used in HMMER’s forward and backward stages is highly variable, growing as the product of sequence length and HMM length. This can cause a machine to run out of RAM on some runs but not others, depending on whether a number of high-RAM computations happen at the same time or not.

Nick Carter - Chat @ Spike [106j0m]

On April 27, 2021 at 0:47 GMT, Gabriel Al Ghalith @.***> wrote:

Thanks! This is a helpful explanation. I'd used it because it afforded a small 1-2% speedup (I'm trying to squeeze as much performance as I can out of it, since it is a core component of a few QC and annotation pipelines I'm kicking the tires on.

Indeed once -ffast-math is removed, the error disappears. Interestingly, all other combinations of flags, code profiling options, even using the intel compiler, all seemed to proceed without this issue. In practice I've rarely come across a case that so intimately depends on IEEE754 (but they exist, clearly! Compiling glibc is another prominent example that actually stops in its tracks without compiling if -ffast-math is detected). Feel free to close this.

The only other interesting problem I've had is if hmmer is using more than 128 threads on a system, split across hmmer instances. I get strange non-deterministic segfaults even in the standard conda compile whenever using > 128 threads in total (across concurrent hmmsearch runs, not within a single instance of hmmsearch). But since I can't fathom what threading model could possibly lead to this behavior (signed char as # threads? Or unsigned uint8_t but not accounting for the I/O thread?), and because I haven't seen other reports of this using the standard conda compile, I am hesitant to report the behavior without a clear confirmation that it is indeed an issue with hmmer and not some other aspect of the pipeline/stack.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

GabeAl commented 3 years ago

Interesting, what kind of RAM use are we talking here (as ballpark)? My system has 2TB of RAM and it is memory-defragmented before any HPC run (e.g. all caches are dropped and no other processes/ramdisks are loaded beyond the basic Fedora OS). Working disk space is also large at ~90TB free (in case of large temporary files). My experience on laptops with 4GB RAM and 8 instances (4C/8T CPU) of the prokka and checkm pipelines I'm using this for haven't run into this issue. This new rig has 500x the memory for 32x the number of processes. I should try with just 129 vs 128 processes and see if there is a definite break there.

npcarter commented 3 years ago

Yeah, with that much RAM I doubt you're having that sort of out-of-RAM error. On the searches I run, I see some that need more than 16GB single-threaded, but I doubt you'd see enough of those searches happening at the same time to overwhelm a 2TB machine.

What sorts of core counts per run are you using? HMMER 3 doesn't get much performance benefit from more than 2 cores per search due to file parsing limitations. We're working on that for HMMER 4, but using more than 2 cores/search on HMMER 3 is generally a waste.

-Nick

On Mon, Apr 26, 2021 at 9:48 PM Gabriel Al-Ghalith @.***> wrote:

Interesting, what kind of RAM use are we talking here (as ballpark)? My system has 2TB of RAM and it is memory-defragmented before any HPC run (e.g. all caches are dropped and no other processes/ramdisks are loaded beyond the basic Fedora OS). Working disk space is also large at ~90TB free (in case of large temporary files). My experience on laptops with 4GB RAM and 8 instances (4C/8T CPU) of the pipeline haven't run into this issue before. This new rig has 500x the memory for 32x the number of processes. I should try with just 129 vs 128 processes and see if there is a definite break there.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EddyRivasLab/hmmer/issues/238#issuecomment-827250918, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZHDSXCGJQXY7OTNC53TKYJYTANCNFSM43P4HE3A .

GabeAl commented 3 years ago

Thanks, this is great context. Yes, I had been running up to 256 separate instances of hmmsearch --cpu 1. (This way it multi-threads quite well indeed!)

But as there is another hidden I/O thread running anyway, and I can stagger other process to run concurrently, I now limit runs to 128 instances of hmmsearch --cpu 1. Because it is all running asynchronously and being aggregated in the background, there is minimal (or negative!) performance loss by dropping the hyperthreading!

GabeAl commented 3 years ago

there is minimal (or negative!) performance loss by dropping the hyperthreading!

Actually this led to some head-scratching so I decided to investigate what was going on with my hmmscan threading where running fewer instances would yield higher performance... then I spotted it. Opened a new discussion, #240

EddyRivasLab / hmmer

FATAL: J state unsupported #238