NOAA-PMEL / Ferret

The Ferret program from NOAA/PMEL
https://ferret.pmel.noaa.gov/Ferret/
The Unlicense
55 stars 21 forks source link

frame/file triggers "ERROR Ferret crash; signal = 11" #1974

Open liverwust opened 4 years ago

liverwust commented 4 years ago

There are many Ferret users at NOAA's Geophysical Fluid Dynamics Laboratory (GFDL). Recently, one of them began seeing the following error:

$ module load ferret/7.02
$ ferret
     *** NOTE: Unable to create journal file ferret.jnl
     NOAA/PMEL TMAP
     FERRET v7.02 (beta/optim)
     Linux 2.6.32-642.6.1.el6.x86_64 64-bit - 10/25/16
     22-Apr-20 20:01    

yes? use /path/to/file1.nc
yes? use /path/to/file2.nc
yes? plot/thick=3 pslann[d=1,l=1:110,x=-110,y=40],pslann[d=2,l=1:110,x=-110,y=40]*.01
yes? LABEL/NOUSER .1 6.3, -1, 0, 0.14, "Title Of Plot"
yes? frame/file="output.gif"
**ERROR Ferret crash; signal = 11

This error occurs across numerous versions of the ferret environment module which are installed at GFDL — version 7.02 just happens to be the "default" version.

Attaching the gdb debugger to the ferret process produces this backtrace following the crash:

(gdb) bt
#0  0x0000000000759543 in Get_XColors ()
#1  0x0000000000759a8f in Window_Dump ()
#2  0x000000000075903a in put_frame_ ()
#3  0x0000000000662f46 in save_frame_ ()
#4  0x00000000005a2fe8 in xeq_frame_ ()
#5  0x000000000046ea93 in ferret_dispatch_ ()
#6  0x0000000000644e46 in ferret_dispatch_c ()
#7  0x000000000046d636 in command_line_run ()
#8  0x000000000046e10b in main ()
(gdb)

I am one of GFDL's Linux administrators, and I am filing this ticket on the behalf of a user in our Linux workstation environment. So far, I have been unable to reproduce this problem on my own, despite my best efforts to control environment variables (PATH, LD_LIBRARY_PATH, various FER_xx vars), loaded modules, permissions, and so on. The fact that the crash occurs in a function called Get_XColors makes me think that this is something I will need to fix, but I don't know where to look next.

Can you help me to understand what Ferret might be trying to do? Is there any additional testing or information which I can provide?

karlmsmith commented 4 years ago

In the opening remarks, ferret reports

*** NOTE: Unable to create journal file ferret.jnl

which tells me you do not have write permission in the current directory. Since you are attempting to save the GIF file to the current directory, I would suspect this is the cause. I am not sure why it is showing up in this manner if this is the case.

AndrewWittenberg commented 4 years ago

I just tried this at GFDL, and also do not reproduce the error when I first do a module purge and then module load ferret/7.02. Must be something in the user's environment, perhaps pointing to incompatible graphics libraries (qt, cairo)? I think I've heard of problems like this before; I can't recall if it was the LD_LIBRARY_PATH, or the versions of the netcdf/hdf libraries loaded by FRE, or what. Did you have them do a printenv and then diff that with yours?

Louis: if it helps, you can check out how I set up my (working) GFDL environment in:

/home/atw/.atw_environment

liverwust commented 4 years ago

@karlmsmith and @AndrewWittenberg , thank you for your prompt responses. I apologize for my own delayed response.

As luck would have it, we were investigating an unrelated issue and discovered that the TigerVNC server's color depth setting (-depth in the manpage) was to blame. In that case, the application in question failed to launch properly when using 16-bit color depth, and worked fine with 24-bit (native) color depth. Sure enough, re-running the aforementioned steps (with an adjustment for the read-only /archive FS, thanks @karlmsmith) crashes Ferret with 16-bit color and works fine with 24-bit color.

I will see if TigerVNC lists this as a known behavior with certain widget toolkits and report back my findings.