PersonTheCat / OreStoneVariants

A powerful utility for generating new blocks when given a foreground and background.
GNU General Public License v3.0
6 stars 8 forks source link

JVM crashes with EXCEPTION_ACCESS_VIOLATION During world gen. #147

Closed HarroldSaxon closed 2 years ago

HarroldSaxon commented 2 years ago

I'm getting this crash while generating chunks in a server and it seems to be OSV doing it. I've tried different versions of forge, java, different pc, still get the same crash while generating new chunks at random. I only get it with several other mods installed, but can't find a specific incompatible mod. Any help with this issue would be greatly appreciated. hs_err_pid8176.log debug-1.log 2022-01-12-1.log

PersonTheCat commented 2 years ago

Forge and OSV (1.16) are not compatible with Java 11, only Java 8. This will change again in 1.18, where both mods will only be compatible with Java 18+. This is a limitation of the mod and not necessarily a bug. As a result, I will not fix it. For a workaround, you'll have to downgrade to Java 8.

For reference, this is because OSV's ore blocks dynamically generate their behavior from other blocks. This is accomplished by placing other blocks into a fake world and transforming their outputs into the normal world.

In the future, I will consider providing a compatibility mode, but that is very low priority for the time being. Leaving this open for visibility.

HarroldSaxon commented 2 years ago

I tried java 8 first. as I previously stated, I did try multiple configurations over the last few days.

PersonTheCat commented 2 years ago

OSV uses a library that was renamed in Java 9. Give it another try with Java 8 and share the crash report when using that version.

HarroldSaxon commented 2 years ago

here's the logs from java 8 hs_err_pid1724.log

debug-2.log 2022-01-11-2.log

PersonTheCat commented 2 years ago

Interesting. I've not hit any issues using AdoptOpenJDK, but I would hope it works with Oracle / Sun official, as well. Unfortunately, I don't see any OSV code at all in these reports. I also don't see enough of my world gen-related log statements at the end of the log to suggest that OSV is doing anything here. Without any more clues, this may take quite some time to diagnose. I won't be able to test with more than a few other mods due to the workload required, but l will update here in the next few days with any details I discover.

HarroldSaxon commented 2 years ago

do you use openjdk8-hotspot, or openjdk8-openj9? I'll try running on the same jdk and see if it works. the error is thrown by the garbage collection thread shortly after creating a region interceptor in every case, with higher odds of a crash the more world gen heavy mods I add in. with my current pack, it's about 1 crash in every 10~15 region interceptors created. If you have a more in depth debug output option, i can try to get more info too.

PersonTheCat commented 2 years ago

There should only be one interceptor for each of three types: ClientLevel, ServerLevel, and WorldGenRegion at any given time. Additional types do get tracked, but that's supposed to be a failsafe to provide details about unsupported types. I'll setup a build with some extra debug statements and share that here in a few hours, if needed.

PersonTheCat commented 2 years ago

Thanks for the clarification and additional details.

PersonTheCat commented 2 years ago

I did some thinking about this and realized that a maximum of one interceptor per type is not guaranteed. I see this as potentially a massive issue, which I will want to fix right away. Thanks for bringing that to my attention.

Let's analyze the architecture of OSV's InterceptorDispatcher. OSV maintains a separate pool of interceptors per-thread. This is so that multiple blocks can be processed between threads and each interceptor will be responsible for its own transformations to some independent Level.

Ordinarily, this design has a relatively small memory footprint. This is because, in vanilla (or with no other mods), only the render thread and a single worker thread are ever responsible for block processing. In my testing, this means a total of 5 interceptors should exist under normal circumstances (or just 2 if the world has already generated).

However, when other mods are introduced, this environment changes. Forge spawns a massive number of threads (imo too many) and it makes sense that with some mods being installed, more and more of those threads might trigger a new interceptor to spawn. I would like to see this number never exceed 10 interceptors in total.

My plan to achieve this is to generate a single interceptor per type (across threads), with multiple handles (containing block references, etc) per thread. To be extra safe, these handles should probably be evicted immediately once they've been used. But that's a really significant change and I'll need a couple days to figure it out.

In the meantime, I can really use your help taking some diagnostics, since it's very difficult for me to test in heavily-modded environments.

TL;DR: Please try this build out and share any info at all. It probably won't fix anything, but it will throw some useful log spam in there. Thanks in advance for your help.

OSV-Forge-7.4.3-remapped.zip

Edit: since we're here, I will introduce a compatibility mode with JRE 9+ in the next version, but it will be off by default. Essentially, the problem with adding compatibility for this is that it completely rips out all of the dynamic block behaviors provided by the mod. This would essentially defeat the purpose of using the mod in the first place. In any case, since we're already here, now's a great time.

HarroldSaxon commented 2 years ago

I've been trying to reproduce the crash for the last hour with no luck so far. it was crashing every 20 min or so before, but I have had it go a few hours before crashing from time to time when I was trying to solve this issue myself. I'll set a chunk pregenerator running overnight and get back to you with more information tomorrow.

PersonTheCat commented 2 years ago

Hey, can you share the log when you do? I set it up to keep a tally of how many interceptors there are. How I fix the problem will depend on that number.

HarroldSaxon commented 2 years ago

So, it's been a long day, and i made a foolish mistake. I dropped the .zip file into my mods instead of the .jar. Now with the mod actually loading, i am getting crashes. sorry for the mix up. 2022-01-13-1.log debug-1.log hs_err_pid2800.log this crash was during spawn area generation, but I'll set it to loop for a while like I planned to and see if I can get some better logs showing those multiple instances of interceptors for the overworld generator.

PersonTheCat commented 2 years ago

Looks like the most recent interceptor was alive for a good 15 seconds before the crash. It would have logged before it started to create a new one, so it does kinda sound like this might not necessarily be OSV. I took a look at TerraForged earlier and it looks like they're also wrapping the world gen regions.

No idea. I'll give this a better look in the morning when I'm not on mobile.

PersonTheCat commented 2 years ago

Hey, I really don't know what to do here since this log looks fine and everything is still working for me. The log you sent me indicated that no potentially unsafe code is failing, since the interceptor allocation is the only unsafe code that could ever cause a JVM crash, as far as I know (other than general memory issues).

I'm gonna attach 2 builds here. Would help a lot if you could try them both and share the logs.

OSV-Forge-7.4.4-copyFields.zip OSV-Forge-7.4.4-noCopyFields.zip

HarroldSaxon commented 2 years ago

Hi there. I will run those. I also have a more interesting log here from running overnight where it got up to a whopping 77 interceptors before crashing debug.log latest.log hs_err_pid6788.log g

PersonTheCat commented 2 years ago

Cool, thanks. This is great info. Sounds like one of these mods has changed the world generation code to generate not only in parallel, but in entirely new threads. OSV was not designed for this kind of environment and will need to be updated. I'll do that later today or tomorrow morning and update here when that's implemented. This is a pretty significant change, so I'll need some time and help testing that.

HarroldSaxon commented 2 years ago

I'm running a few more attempts to be sure, but it looks like noCopyFields has actually stopped the JVM from crashing! Now minecraft is crashing and generating crash reports as it should, though it does hang without exiting, so I can't automate relaunching this version. I'll give you the logs and crash reports shortly.

HarroldSaxon commented 2 years ago

copyField latest.log debug.log hs_err_pid8796.log s

HarroldSaxon commented 2 years ago

noCopyFields debug-2.log.gz debug.log latest.log 2022-01-14-1.log.gz 2022-01-14-2.log.gz debug-1.log.gz crash-2022-01-14_15.45.30-server.txt crash-2022-01-14_15.55.17-server.txt crash-2022-01-14_16.02.06-server.txt

PersonTheCat commented 2 years ago

Sounds like the crashes are from the missing data in the interceptor. So, we definitely need it. I bet this didn't fix the problem at all, but instead just introduced a new issue that occurs before the old one.

That's good information, though. I'll give this done careful analysis in a bit and start working on reducing the number of interceptors. I suspect the actual problem comes from basically the JVM running out of memory when doing manual allocations.

I'm just saying, though, it is a little frustrating. Ordinarily, this game would not continue continue spawning new threads for every new region like that. But anyway, I can accommodate and hopefully that will fix it.

HarroldSaxon commented 2 years ago

I wonder if it might be structure generation triggering this, there are a few mods that split off generating there own structures into a separate thread if I recall, and I believe terraforged does this as well.

PersonTheCat commented 2 years ago

Hmm, that's a good thought. It might be. I guess it depends on whether those structures would ever interact with the ore variants. Hard to say. In any case, super hard to say at this point. Hopefully cutting the number down will eliminate the problem.

HarroldSaxon commented 2 years ago

something interesting, I've managed to prevent the crash I was getting with noCopyFields by preventing block ticks. I normally do this for pregenerating worlds as it offers a large speed boost, but it also acts as a 'safe mode' for most ticking block/entity crashes. I'm attempting to reproduce the original JVM crash under these conditions. This setting did not prevent the JVM crash with OSV-Forge-7.4 So far, out of 18 instances running through spawn generation, there have been 0 crashes.

HarroldSaxon commented 2 years ago

So, I've been testing and thinking, and I believe I may have a work around if not a solution. not getting the fields causes redstone ore variants to misbehave, but I've found a few solutions for that issue that I can do myself, and after generating enough of the world I can simply swap back to a version that does copy the data and redstone ores work perfectly again. so, I think what you're doing is allowing different generators to make changes and fetch fields for each one? but what about copying these fields to the config just once, or if you want to keep this flexability, once for each dim? would this allow different generation rules in each dim while not requiring fetching fields for every generator thread? then regeneration of these fields could be set the same way as with resources so they aren't fetched again unless mods are changed? I've still yet to get a crash with these settings, so it is working. and I crash pretty quickly still when generating with any other version of osv. I don't know what these extra threads are doing to junk up the data you're copying, or if it's just the volume of data, perhaps even just coincidentally this change has averted the real cause, but this change has solved my issue.

Edit: as a side note, not copying fields has provided a dramatic performance increase, with average chunks per tick generated being doubled.

PersonTheCat commented 2 years ago

OSV does not intentionally copy fields more than once, ever. In fact, I carefully designed it to only copy fields when the world first loads because it's so expensive to create even one of them. This behavior is a side effect of other mods spawning additional threads to do their world gen work after the world has already been created, which I had not accounted for.

In a vacuum, there's nothing wrong with other mods upping the parallel threads to do world gen. The only problem with it is that it changes the game in critical and unpredictable ways. OSV is complex enough that sometimes I depend on that behavior.

Essentially, the dispatcher was setup to allow one set of interceptors per thread, but this was only designed to accommodate the render thread and main thread each doing their own, independent block processing.

Don't get me wrong, it's not ideal for other mods to continue spawning hundreds of new threads and never reuse them, but OSV should still be able to accommodate that.

Still no idea what the problem is. All I know is that there should never, ever be more than 5 of these things. So, I'll start by fixing that and we'll see where that gets us for now.

HarroldSaxon commented 2 years ago

Okay, sounds good, and thank you for communicating with me so much. I really appreciate it.

PersonTheCat commented 2 years ago

Hey, I got the new model all setup. This was quite the change. Everything is working here, but before I can release this, I have to test it on other platforms and with several mods. This will definitely take awhile.

Do you mind helping me test this build as well? I'm not really sure if it's gonna fix the issue or not. All I know is that it will reduce the number of interceptors like this:

Anything you can tell me about this build will be extremely helpful.

Thanks again for your time!

OSV-Forge-7.4.5-remapped.zip

HarroldSaxon commented 2 years ago

Of course, will do. I'll start on that now.

HarroldSaxon commented 2 years ago

There is still a chance of crashing during server startup, when the interceptors are first created. 2 intercepters are being created on startup now.

type: class com.terraforged.mod.chunk.fix.RegionFix in thread: Thread[Worker-Main-11,5,SERVER] and type: class net.minecraft.world.server.ServerWorld in thread: Thread[Server thread,5,SERVER] I've yet to see a crash once these are up, but would like to generate a good 1,000,000 chunks before calling it good. Unfortunately, although chunk generation starts off faster at about 0.7chunks a tick, it slows over time till generation stalls out completely at around 5000 chunks.

PersonTheCat commented 2 years ago

Don't count too much on performance with these builds. There's a lot of info logging as the world generates, which will definitely slow things down.

I also wonder what's going on with this RegionFix type. I'm actually not getting any region interceptors, at all.

As for the slowdown over time, that sounds like a memory leak, but seeing as no additional interceptors are getting created over time, that should be pretty much impossible (at least for OSV). I'll set it up to track the number of interceptors to make sure they're getting all getting disposed over time.

If possible, are you able to test this without Terraforged? Maybe this is just a compatibility issue with Terraforge and I need to put more time into that mod specifically.

Edit: Confirmed that the handles are getting cleared whenever the GC runs, but it's still creating thousands of them. I'll see if somehow I can have it reuse them and know when to clear them out if there are too many

HarroldSaxon commented 2 years ago

sure, I can test without Terraforged, but it'll make looking for crashes take much much longer. With only your mod and Terraforged i've yet to see a crash at all, and I had thought this was an issue with terraforged before they refuted it and I did eventually manage to recreate the crash without that mod. but it certainly makes world gen far more intensive and is often I think the last straw.

PersonTheCat commented 2 years ago

Updated to reuse old handles. These handles are never cleared manually. They should still get removed as the old threads die off. And, even if they don't, we were only up to 77 interceptors before. Assuming we create the same amount of handles overnight, that's an extremely small memory footprint. These objects just track the block being replaced, its target block, and some coordinates.

Thanks again for your spectacular help. If you ever join the Discord server, just let me know it's you and I'll give you a fancy green name.

OSV-Forge-7.4.6-remapped.zip

Edit: this build also drastically reduces the amount of debug logs. You should still be able to search for total interceptor and find the number of interceptors / handles.

HarroldSaxon commented 2 years ago

I'm running a loop of generating new worlds till one crashes without Terraforged on one of my pcs. on another, I'm trying to figure out what's going on with terraforged.mod.chunk.fix.RegionFix it looks to be keeping chunks loaded after they report as fully generated? This also appears to be where the slowdown is happening as well. And then despite not generating any more chunks, features generating keep trickling in, so it may be that that's what this is handling. I suppose you could try having the interceptor ignore them completely, and I could see what breaks if anything.

Edit: just saw your last post! I'll give it a try

PersonTheCat commented 2 years ago

Yeah, it will just create some minor generation bugs with some ore variants, depending on the block used. I'll consider that, but I'd rather the Terraforged devs look into the issue instead.

Just curious, how are you tracking which chunks are loaded?

HarroldSaxon commented 2 years ago

chunk pregenerat has some nice features hidden away for that

HarroldSaxon commented 2 years ago

I'm going to bed after this update, but it seems 7.4.6 does run better when it runs, but also breaks a bunch of moded structures, and crashes much more often than previous versions. It always crashes with terraforged now, even if just osv and terraforged are present, which didn't happen before, and I've gotten a few crashes without it now too. I have a lot of debug logs, so just tell me what situations you'd like to look at if any. I'll try to find the right ones in the morning if you do.

PersonTheCat commented 2 years ago

Just share any crash reports. How are modded structures breaking? What kind of breaking?

HarroldSaxon commented 2 years ago

Critical error detected whilst generating Features crash-2022-01-16_00.39.30-server.txt crash-2022-01-16_00.19.03-server.txt crash-2022-01-15_23.38.38-server.txt 2022-01-16-8.log I can't find the other two associated logs of this error, but i'll keep looking. i have something like 80 log files, but will try to just post a few of each type of crash This is what i meant by breaking the debug-3.log 2022-01-15-3.log m

HarroldSaxon commented 2 years ago

killed by watchdog after hanging for 10 minuets during startup debug.log latest.log

HarroldSaxon commented 2 years ago

ServerChunkProvider cannot be cast to personthecat.osv crash-2022-01-15_22.50.28-server.txt 2022-01-15-3.log debug-3.log

HarroldSaxon commented 2 years ago

jvm crash Without terraforged debug.log latest.log

HarroldSaxon commented 2 years ago

some with terraforged 2022-01-15-1.log debug-1.log [debug-1.log](https://github.com/PersonTheCat/OreStoneVariants/files/7877320/debug-1.log) [2022-01-16-2.log](https://github.com/PersonTheCat/OreStoneVariants/files/7877321/2022-01-16-2.log) 2022-01-15-2.log debug-2.log

HarroldSaxon commented 2 years ago

I have a lot more, but they're really just these same things over and over, and honestly I should have organized them better in the first place, but I was busy and neglected to do so.

PersonTheCat commented 2 years ago

Cool, thanks.

This first exception, a NPE in TFChunkGenerator, does not appear to be caused by OSV. This call chain is impossible:

at c.t.m.c.TFChunkGenerator.func_235957_b_(TFChunkGenerator.java:150)
at p.o.w.i.WorldGenRegionInterceptor.prime(WorldGenRegionInterceptor.java:115)

Because prime just sets a value in the interceptor.

The second exception is the same, but somehow calls a different method:

at com.terraforged.mod.chunk.TFChunkGenerator.func_230354_a_(TFChunkGenerator.java:261)
at p.o.w.i.WorldGenRegionInterceptor.prime(WorldGenRegionInterceptor.java:115)

I have no idea how either of these methods is getting called. Again, all this method is doing is updating a value.

The third error, We are asking for a chunk out of bounds, is also not possible.

at n.m.w.g.WorldGenRegion.func_217353_a(WorldGenRegion.java:131) 
at j.l.ThreadLocal.get(ThreadLocal.java:161)
at p.o.w.i.InterceptorAccessor.getHandle(InterceptorAccessor.java:21)}

Once again, I have no idea how this code is getting executed. InterceptorAccessor#getHandle is just reading a value in thread local storage.

Based on that alone, I'm guessing the memory is somehow getting corrupted. But, I have no idea where that's happening, because again, everything is working just fine for me and the only thing I haven't tried at this point is TerraForged.

Unfortunately, without these issues occurring for me, it's entirely possible that I'm just making things worse by changing well-tested code that was previously working fine. I'll make a few test changes in a few and upload them here for testing. But after this, I'm currently out of ideas. It may not be worth redesigning this entire mod just to provide compatibility with TerraForged.

PersonTheCat commented 2 years ago

Yes, definitely memory corruption. Looking at the fourth error about ServerChunkProvider cannot be cast to InterceptorHandle, this data cannot possibly be a ServerChunkProvider. I've reimplemented this "copy fields" method again. Just a shot in the dark. It has to be done and maybe this way will be a little safer.

Let's hope this one does it!

OSV-Forge-7.4.7-remapped.zip

HarroldSaxon commented 2 years ago

I agree. I wish I could take a look under the hood of terraforged, as 99% of the crashes are with that mod. the same crash without it is extremely rare. so whatever is causing the original issue, terraforged is doing it a lot more than any other mod. So far, 7.4.4-noCopyFields is the only version that doesn't crash at all with the right settings. The only thing i can think of is either some mods are doing something very strange with the feilds, making them corrupt when copied, or the intercepter is hooking into a class that isn't actually what it thinks it is, and getting bad, even impossible data back.

HarroldSaxon commented 2 years ago

In any case, 7.4.4-noCopyFields provides a work around. If you could add a config setting to turn copying on or off, it'd be more than enough for compatibility.

HarroldSaxon commented 2 years ago

oh! ok, i'll try it!

HarroldSaxon commented 2 years ago

7.4.7 didn't crash, but hangs on copying fields into RegionFix. I'll see if I can get past this point, but it'll be a few hours before I can try more on this. debug.log latest.log

PersonTheCat commented 2 years ago

Thanks! I think that just told me what the problem has been this whole time.

It's actually copying out of RegionFix, but some of the fields in RegionFix aren't in WorldGenRegionInterceptor, so it can't copy all of them. I have no idea why it's hanging, but it should fail and I understand now why OSV never did fail in the past, while still corrupting the memory as a result of this unexpected type.

Here's an updated build which correctly verifies that each field is available in the destination and the source. I have it logging each field that gets copied, because it should be unable to print the data if it's corrupted, thus confirming that it isn't.

I've also updated the field copying to use reflection, which should prevent the behavior from changing on different platforms. I hate this solution, but I also hate editing all of the bytecode so that fields can be copied normally, like I did in 6.2. For anyone else reading, we need to copy these fields because the Level objects cannot be instantiated normally. This is because, for some reason, Mojang throws a whole bunch of init code in their levels which loads a bunch of entities, creates new save files, and does a whole bunch of other work. For an interceptor to be viable, I need a way to skip this work. And that can only be done--at least in 1.16--by dodging the constructor invocation. It's quite an extreme solution, but this is the only way to dynamically copy behavior from other blocks.

@HarroldSaxon Don't worry if you aren't able to test this, but if you are--as always--, that would be a huge help. I'll try it out as well in several hours with TerraForged to make sure we have it worked out.

Edit: Forgot to attach the file.

OSV-Forge-7.4.8-remapped.zip