dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

httpd: stacktrace from webadmin's RrdPoolInfoAgent on startup #3488

Closed paulmillar closed 2 years ago

paulmillar commented 7 years ago

See https://ci.dcache.org/view/dCache%203.2/job/Deploy%203.2/29/

06 Sep 2017 11:19:20 (httpd) [] Uncaught exception in thread Thread-24
java.lang.ExceptionInInitializerError: null
        at org.rrd4j.graph.RrdGraphDef.<init>(RrdGraphDef.java:74) ~[rrd4j-2.2.1.jar:na]
        at org.dcache.webadmin.model.dataaccess.util.rrd4j.RrdPoolInfoAgent.getGraphDef(RrdPoolInfoAgent.java:267) ~[na:na]
        at org.dcache.webadmin.model.dataaccess.util.rrd4j.RrdPoolInfoAgent.createPlot(RrdPoolInfoAgent.java:204) ~[na:na]
        at org.dcache.webadmin.model.dataaccess.util.rrd4j.RrdPoolInfoAgent.run(RrdPoolInfoAgent.java:176) ~[na:na]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
Caused by: java.lang.RuntimeException: java.io.IOException: Problem reading font data.
        at org.rrd4j.graph.RrdGraphConstants$FontConstructor.getFont(RrdGraphConstants.java:274) ~[rrd4j-2.2.1.jar:na]
        at org.rrd4j.graph.RrdGraphConstants.<clinit>(RrdGraphConstants.java:291) ~[rrd4j-2.2.1.jar:na]
        ... 5 common frames omitted
Caused by: java.io.IOException: Problem reading font data.
        at java.awt.Font.createFont0(Font.java:1000) ~[na:1.8.0_141]
        at java.awt.Font.createFont(Font.java:877) ~[na:1.8.0_141]
        at org.rrd4j.graph.RrdGraphConstants$FontConstructor.getFont(RrdGraphConstants.java:270) ~[rrd4j-2.2.1.jar:na]
        ... 6 common frames omitted
alrossi commented 7 years ago

If this is from a recent update of the JVM, there is probably an incompatibility with the awt library. Or it could be native ... what system is this running on?

We may need to update the rrd4j library most recent on GitHub looks like 3.1; ours is 2.2.1.

Hopefully this will be going away soon, but webadmin is still with us for a while, I know :-)

paulmillar commented 7 years ago
[root@release-3-2 ~]# java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
[root@release-3-2 ~]# 
paulmillar commented 7 years ago

Our test machine for dCache v3.1 has the same java version:

[root@release-3-1 ~]# java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
[root@release-3-1 ~]# 

but doesn't have these stack-traces

alrossi commented 7 years ago

That's why I thought it might be a native library issue. Are they two different OS versions or architectures?

I very much doubt any change between 3.1 and 3.2 is responsible for this. I have not touched that code in webadmin in ages.

alrossi commented 7 years ago

Unless the rrd4j version in the 3.1 pom is different ... let me check.

alrossi commented 7 years ago

Nope. same

paulmillar commented 7 years ago

The problem seems to come from trying to initialise the font data. There are two fonts embedded in the jar file:

[root@release-3-1 ~]# unzip -v /usr/share/dcache/classes/rrd4j-2.2.1.jar |grep ttf
  301280  Defl:N   181317  40% 05-14-2015 20:41 456415ad  DejaVuSansMono-Bold.ttf
  298836  Defl:N   176430  41% 05-14-2015 20:41 c8547ecd  DejaVuSansMono.ttf
[root@release-3-1 ~]# 

The rrd4j class RrdGraphDef implements RrdGraphConstants, which contains static initialiser that is triggering the problem:

    static final Font DEFAULT_SMALL_FONT = FontConstructor.getFont(Font.PLAIN, 10);

This is the first use of FontConstructor.getFont, which basically uses Java's AWT to "load" the font. Part of this process makes a copy of the font in /tmp with names like +~JF<number>.tmp (e.g., +~JF1695553106093651651.tmp).

On release-3-2, these temporary files exist, have the same size as DejaVuSansMono.ttf, and file identifies the content as TTF data. This suggesting the problem isn't with accessing the font data from within the jar, but rather subsequent reading it. The dcache user can read the file's contents without any problem (as tested with dd), so it must be the parser is somehow behaving differently on this machine that on release-3-1 (and sisyphus).

Unfortunately, AWT swallows whatever exception triggers the problem and provides only the stock phrase "Problem reading font data.", so there's no clear indication what caused this problem.

Although AWT font support (FontManager, specifically) is platform specific, unless sun.font.fontmanager property is specified, the system autodetects the platform as being wither Windows, Mac or X11 (sun.awt.X11FontManager).

paulmillar commented 7 years ago

It is possible to enable debug information on AWK font support by enabling a system property (-Dsun.java2d.debugfonts=debug).

On release-3-1, this yields no additional logging, but on release-3-2, this yields

07 Sep 2017 10:59:02 (httpd) [] Failed to get info from libfontconfig
07 Sep 2017 10:59:02 (httpd) [] Failed to get info from libfontconfig
paulmillar commented 7 years ago

I've installed the font (DejaVu sans) and fontconfig in the OS:

yum install fontconfig dejavu-sans-mono-fonts

This seems to have "fixed" the problem: after installing these two packages and restarting dCache, the exception has gone away.

To me, this looks like an rrd4j bug and/or JDK bug. It seems that rrd4j believed that embedding the TrueType font in the jar file would alleviate dependencies on the underlying OS. This turns out not to be true for Linux and this version of OpenJDK.

alrossi commented 7 years ago

Paul, thanks so much for investigating this issue. I have been rather intent on trying to get a full resolution of the DESY cloud resilience issues that I just neglected this, thinking I could turn to it afterward.

I'm glad the installation of the fonts fixed the problem. But if we want (or you want me :-) ) to file a bug report with rrd4j, we need to give them specific information about the differences in OS/architecture that triggers the apparent parser failure. How precisely do the two nodes differ in this respect?

paulmillar commented 7 years ago

I think the underlying difference is that, previously our release-testing machines included a rather heavy list of dependencies. With release-3-2, the machine was created using more of DESY infrastructure and from a "lighter" initial set of RPMs.

In particular, earlier build machines had a full Gnome desktop (which included fontconfig and a good set of default fonts), whereas release-3-2 does not include any of this.

I think the next step would be to see if anyone has reported this problem to rrd4j before -- perhaps a newer version already has this fixed. If not, then we should open a bug report.

alrossi commented 2 years ago

Web admin no longer supported.