Closed paulmillar closed 2 years ago
If this is from a recent update of the JVM, there is probably an incompatibility with the awt library. Or it could be native ... what system is this running on?
We may need to update the rrd4j library most recent on GitHub looks like 3.1; ours is 2.2.1.
Hopefully this will be going away soon, but webadmin is still with us for a while, I know :-)
[root@release-3-2 ~]# java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
[root@release-3-2 ~]#
Our test machine for dCache v3.1 has the same java version:
[root@release-3-1 ~]# java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
[root@release-3-1 ~]#
but doesn't have these stack-traces
That's why I thought it might be a native library issue. Are they two different OS versions or architectures?
I very much doubt any change between 3.1 and 3.2 is responsible for this. I have not touched that code in webadmin in ages.
Unless the rrd4j version in the 3.1 pom is different ... let me check.
Nope. same
The problem seems to come from trying to initialise the font data. There are two fonts embedded in the jar file:
[root@release-3-1 ~]# unzip -v /usr/share/dcache/classes/rrd4j-2.2.1.jar |grep ttf
301280 Defl:N 181317 40% 05-14-2015 20:41 456415ad DejaVuSansMono-Bold.ttf
298836 Defl:N 176430 41% 05-14-2015 20:41 c8547ecd DejaVuSansMono.ttf
[root@release-3-1 ~]#
The rrd4j class RrdGraphDef
implements RrdGraphConstants
, which contains static initialiser that is triggering the problem:
static final Font DEFAULT_SMALL_FONT = FontConstructor.getFont(Font.PLAIN, 10);
This is the first use of FontConstructor.getFont
, which basically uses Java's AWT to "load" the font. Part of this process makes a copy of the font in /tmp
with names like +~JF<number>.tmp
(e.g., +~JF1695553106093651651.tmp
).
On release-3-2, these temporary files exist, have the same size as DejaVuSansMono.ttf, and file
identifies the content as TTF data. This suggesting the problem isn't with accessing the font data from within the jar, but rather subsequent reading it. The dcache
user can read the file's contents without any problem (as tested with dd
), so it must be the parser is somehow behaving differently on this machine that on release-3-1 (and sisyphus).
Unfortunately, AWT swallows whatever exception triggers the problem and provides only the stock phrase "Problem reading font data.", so there's no clear indication what caused this problem.
Although AWT font support (FontManager
, specifically) is platform specific, unless sun.font.fontmanager
property is specified, the system autodetects the platform as being wither Windows, Mac or X11 (sun.awt.X11FontManager
).
It is possible to enable debug information on AWK font support by enabling a system property (-Dsun.java2d.debugfonts=debug
).
On release-3-1, this yields no additional logging, but on release-3-2, this yields
07 Sep 2017 10:59:02 (httpd) [] Failed to get info from libfontconfig
07 Sep 2017 10:59:02 (httpd) [] Failed to get info from libfontconfig
I've installed the font (DejaVu sans) and fontconfig in the OS:
yum install fontconfig dejavu-sans-mono-fonts
This seems to have "fixed" the problem: after installing these two packages and restarting dCache, the exception has gone away.
To me, this looks like an rrd4j bug and/or JDK bug. It seems that rrd4j believed that embedding the TrueType font in the jar file would alleviate dependencies on the underlying OS. This turns out not to be true for Linux and this version of OpenJDK.
Paul, thanks so much for investigating this issue. I have been rather intent on trying to get a full resolution of the DESY cloud resilience issues that I just neglected this, thinking I could turn to it afterward.
I'm glad the installation of the fonts fixed the problem. But if we want (or you want me :-) ) to file a bug report with rrd4j, we need to give them specific information about the differences in OS/architecture that triggers the apparent parser failure. How precisely do the two nodes differ in this respect?
I think the underlying difference is that, previously our release-testing machines included a rather heavy list of dependencies. With release-3-2, the machine was created using more of DESY infrastructure and from a "lighter" initial set of RPMs.
In particular, earlier build machines had a full Gnome desktop (which included fontconfig and a good set of default fonts), whereas release-3-2 does not include any of this.
I think the next step would be to see if anyone has reported this problem to rrd4j before -- perhaps a newer version already has this fixed. If not, then we should open a bug report.
Web admin no longer supported.
See https://ci.dcache.org/view/dCache%203.2/job/Deploy%203.2/29/