Closed arlith closed 8 years ago
I want to be more explicit on the drawHex NPE.
What is happening here is that TilesetManager.baseFor
or supersFor
is returning null. This occurs if HexTileset.getBase
or getSupers
returns a null image or a collect of images with a null entry. This comes down to HexTileset.assignMatch
, which in turns relies upon HexTileset.baseFor
or supersFor
. Entries in the hex tileset are stored in memory uses HexTileset.HexEntry
, which provides HexEntry.getImage(2)
, which relies upon HexEntry.loadImage(1)
. These last two are ultimately where the issue lies.
For getImage
, there's a vector<Image>
state variable: if this member is empty or null, then the images are loaded. After that, if the vector is still empty, null is returned (which is where I think the problem lies). Otherwise, it returns the first element if there's only one, or a random element.
HexEntry.loadImage
then tries to load each image file using ImageUtil.loadImageFromFile
, and if the returned value is non-null, then the Image
is added to the vector of images.
What I assume is happening is that ImageUtil.loadImageFromFile
is sometimes returning null, which means that the images
vector can possibly be empty, which causes getImage
to return null.
To further expand, ImageUtil.loadImageFromFile(2)
is designed to return null in some cases, namely if the passed String filename is null or if none of the image loaders can load the filename.
There are two image loaders right now: TileMapImageLoader
and AWTImageLoader
. I expect that the tile map loader is always returning null, so I guess that AWTImageLoader
is sometimes returning null for some entries, but not always. It's not clear to me in what case it would sporadically return null.
Issue #272 have 2 ways to reproduce the drawhex error.
On Tuesday, July 26, 2016, Nicholas Walczak notifications@github.com wrote:
To further expand, ImageUtil.loadImageFromFile(2) is designed to return null in some cases, namely if the passed String filename is null or if none of the image loaders can load the filename.
There are two image loaders right now: TileMapImageLoader and AWTImageLoader. I expect that the tile map loader is always returning null, so I guess that AWTImageLoader is sometimes returning null for some entries, but not always. It's not clear to me in what case it would sporadically return null.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MegaMek/megamek/issues/284#issuecomment-235003622, or mute the thread https://github.com/notifications/unsubscribe-auth/AQRi2cJ9oovGhXkV1lk4BhMKB4Y4EvYmks5qZOI-gaJpZM4JURti .
The problem is that the issue isn't reliably reproduceable: I have been unable to reproduce the issue using those steps, although I have seen the issue myself at least once. I believe it has to do with how the images are getting loaded and depending different aspects of the machine (such as SSD vs HDD) might determine how often you see the error.
Received null image from ImageUtil.loadImageFromFile! File: data\images\hexes\lunar\lunar_plains_0.gif That image does exist and is named correctly and is a .gif
Trying to load image for a non-existant file! Path: data\images\hexes\boring\rough_0.gif Received null image from ImageUtil.loadImageFromFile! File: data\images\hexes\boring\rough_0.gif Trying to load image for a non-existant file! Path: data\images\hexes\boring\rough_1.gif Received null image from ImageUtil.loadImageFromFile! File: data\images\hexes\boring\rough_1.gif Trying to load image for a non-existant file! Path: data\images\hexes\boring\rough_2.gif Received null image from ImageUtil.loadImageFromFile! File: data\images\hexes\boring\rough_2.gif
No where in the atmospheric or any of the other tilesets do they ever call for rough images from \boring\rough_0:2 The boring folder only contains beige_rough_0:5 so, I dont understand why MegaMek is even looking for these or what is telling it to look for em?
Case sensitive crap, missing images, lismoke labeled as a .png instead of a .gif Remember to get rid of the defaulthexset.tileset (I have it on mine 0.3.30 HQ bundle but, Arlith does not have the file) which contains the lines for /boring/rough_0:2 which do not exist. Or can just delete those lines out of it but, did not include it in the tileset i modified becuase IMO we dont need that defaulthexset and it's redundant.
Alright, I've done a number of things to address this.
First, HexSetTest was added, which goes through all of the tilesets (including the default one) and checks to see if the file names are valid. This revealed a few problems (especially in the default tileset), and those are fixed.
There's also better logging of when an image cannot be loaded.
Finally, I made it so that the HexTileset won't return null images. Instead, if it would have a null image, a blank hex-sized image is returned instead. This will prevent crashing, although the user will likely not see what they expect, although now the problem should be logged in the logfile.
This should prevent the drawHex NPE, however it's not clear how this would address the camo problems. I did add some code to help ensure that some icon will be returned for units/wrecks. In the past, if the image assigned for a unit failed to load, it would return null. Now, it defaults to the generic icon for that unit, which should help alleviate problems with no icons being drawn. I still need to look into the camo issue though I think.
Alright, I "encountered" this bug again, which shows me that it's not completely licked, however I believe I've pinpointed the problem. So, as I mentioned previously, the NPE is gone, because I added code to avert it (really, this is necessary so that we fail more gracefully). So, when I encountered the bug this time, instead what I saw were two blank hexes. When I looked in the logging code, I saw this:
Round 0 memory usage: 782,539 kB
boardview1: loading images for board
Received null image from ImageUtil.loadImageFromFile! File: data/images/hexes/boring/beige_plains_1.gif
Received null image from ImageUtil.loadImageFromFile! File: data/images/hexes/boring/beige_plains_2.gif
Time to prepare the shadow map: 87.095894 ms
What's important to note here is that the HexTileset
line 589 message was triggered, but ImageUtil
line 118 was not. What this means is that, the image exists on disk, but still the result of loading it was null. I verified that the two files do exist.
In addition to this logging code, I also had some breakpoints set in the code, on ImageUtil
lines 118 and 123. What this tells me is that the AWTImageLoader
didn't fail to load the file. Basically, the only seeming way for AWTImageLoader.loadImage
to return null is if toolkit.getImage(fileName)
returns null. However, I know that this didn't happen because of my breakpoint.
For the HexTileset
log line to get triggered, I know that ImageUtil.loadImageFromFile
must have returned null, which can only happen if the input filename is null (know that isn't true, because the filename is in the log), or if all of the loaders returned a null image. There's only two loaders and I expect the first one, TileMapImageLoader
, to return null.
This means that the only other place that could be returning null in AWTImageLoader.loadImage
method is this line:
return observer.isAnimated() ? result : ImageUtil.createAcceleratedImage(result.getBufferedImage());
I initially had overlooked this line because it's guaranteed that result
is non-null because there's a previous check for it. However, what I did not check is if ToolkitImage.getBufferedImage()
can return null, because it's an undocumented sun.awt
class. Googling on the internet, it turns out that this method does return null, and I'm almost certain that is what's happening in this case.
edit: Additionally, ImageUtil.createAcceleratedImage
returns null if it's passed a null value, perhaps a case where cautious null-guarding actually hid another bug, hehe
Looking at the code involved, it appears that FinishedLoadingObserver
isn't performing how we'd expect it to. Here's the relevant code for ToolkitImage.preload
:
public void preload(ImageObserver iw) {
if (src != null) {
src.checkSecurity(null, false);
}
if ((availinfo & ImageObserver.ALLBITS) == 0) {
addWatcher(iw, true);
}
}
private synchronized void addWatcher(ImageObserver iw, boolean load) {
if ((availinfo & ImageObserver.ERROR) != 0) {
if (iw != null) {
iw.imageUpdate(this, ImageObserver.ERROR|ImageObserver.ABORT,
-1, -1, -1, -1);
}
return;
}
ImageRepresentation ir = getImageRep();
ir.addWatcher(iw);
if (load) {
ir.startProduction();
}
}
Essentially, by calling ToolkitImage.preload
, we ensure that the FinishedLoadingObserver
will always be considered to be loaded, since it calls imageUpdate
on the passed ImageObserver
, which will cause FinishedLoadingObserver
to set its load state to true.
Eh, looks like I was wrong, the ImageObserver.ERROR
flag isn't set, so the imageUpdate
call isn't made. I set the breakpoint in the wrong place; the thread does wait for the observer to finish loading.
I helping :D
So I added this code to ImageUtil.AWTImageLoader.loadImage
:
BufferedImage bImg = result.getBufferedImage();
if (bImg == null) {
System.out.println("ERROR: AWTImageLoader.loadImage got null BufferedImage!");
}
return observer.isAnimated() ? result : ImageUtil.createAcceleratedImage(bImg);
Looking in @BLOODWOLF333's logfile, I can see that that line is triggered, so that shows definitively that the issue is with the ToolkitImage.getBufferedImage
returning null. I didn't have any luck hunting down this problem looking in the OpenJDK sourcecode. Online I've seen people suggest that calling ToolkitImage.getWidth()
(or getHeight()) before accessing the buffered image will ensure that it's non-null. When I looked at the source code, it wasn't clear to me why this would work, but I could throw that work around in.
Using latest .jar
Bad compile with the last I guess?
Alright, after too long, I have figured this out. I found that @BLOODWOLF333 was able to reproduce the issue fairly reliably, so I've been feeding him jar files. What I did was add logging code that checks to see if the ToolkitImage.getBufferedImage()
result was null, and I found that when it was null, the FinishedLoadingObserver.isLoaded()
state was false. Apparently, the ImageFetcher
thread can interrupt the thread that calls Toolkit.getImage()
. So, what was happening was that the busy-wait loop got interrupted by another thread, and it exited prematurely, before the image was done loading. I didn't bother to dig into the sun.awt code to figure out where this happens.
To fix this, I removed the break;
line when the InterruptException
is called. In order to prevent infinite loops, I also added a time-out of 10s.
Additionally, when looking at the code, I noticed that there was a potential for an infinite loop. If the image is already fully loaded when ToolkitImage.preload
is called, it never adds the passed ImageObserver.
The way the busy-wait loop was written, this would lead to an infinite loop. I added some code that checks the state of the ToolkitImage
to see if it's loaded, and only waits if it isn't loaded yet.
I had @BLOODWOLF333 test the changes, and they've solved the drawHex issues for him.
Mind if I have a copy to test too?
On Sunday, July 31, 2016, Nicholas Walczak notificsations@github.com wrote:
Alright, after too long, I have figured this out. I found that @BLOODWOLF333 https://github.com/BLOODWOLF333 was able to reproduce the issue fairly reliably, so I've been feeding him jar files. What I did was add logging code that checks to see if the ToolkitImage.getBufferedImage() result was null, and I found that when it was null, the FinishedLoadingObserver.isLoaded() state was false. Apparently, the ImageFetcher thread can interrupt the thread that calls Toolkit.getImage(). So, what was happening was that the busy-wait loop got interrupted by another thread, and it exited prematurely, before the image was done loading. I didn't bother to dig into the sun.awt code to figure out where this happens.
To fix this, I removed the break; line when the InterruptException is called. In order to prevent infinite loops, I also added a time-out of 10s.
Additionally, when looking at the code, I noticed that there was a potential for an infinite loop. If the image is already fully loaded when ToolkitImage.preload is called, it never adds the passed ImageObserver. The way the busy-wait loop was written, this would lead to an infinite loop. I added some code that checks the state of the ToolkitImage to see if it's loaded, and only waits if it isn't loaded yet.
I had @BLOODWOLF333 https://github.com/BLOODWOLF333 test the changes, and they've solved the drawHex issues for him.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MegaMek/megamek/issues/284#issuecomment-236409028, or mute the thread https://github.com/notifications/unsubscribe-auth/AQRi2Ukhmie_4P5csZUCgMiig-JcFoQvks5qbCMZgaJpZM4JURti .
We are planning a release with this fix incorporated later today (Eastern Standard Time)
I'll email it to you Mega (maybe he can break it)
correct hq_fluff/rail_48.gif
I think this comes from the addition of PR #155, but there have been several disparate issues reported since v0.41.17 all related to displaying images.
The most common issue is an NPE in BoardView1.drawHex(3). Commonly this is at line 2382, but I've also seen it at line 2274. Ultimately, this issue is caused by HexTileset.HexEntry.getImage(2) returning null, which can happen if the HexEntry.images vector is empty.
BoardView1.drawHex(3) NPEs: #251, #272, #283, some mention in #281
Additionally, there have been issues with mech icons not being displayed, or camo not being displayed. I think this issues also lie in the loaded images returning null values for some reason.
Camo/facing issues: #268, #273
Unfortunately, these issues aren't reliably reproduceable, which has made the core problem difficult to hunt down. All of the reports have occurred in versions after v0.41.17, and PR #155 was merged in right after the release of that version. Additionally, it modifies how images are loaded, so I'm pretty certain it's the root of the issue. It's not 100% clear to me why it might return a null sometimes and not others, but I plan to add some logging code that might shed some light on this issue.
I wanted to try to collect all of the information on this issue into one thread, as currently it's all spread around.