BluSunrize / ImmersiveEngineering

Wires, transformers, high voltage! Bzzzzt!
Other
787 stars 392 forks source link

CME in ImmersiveNetHandler.getIndirectEnergyConnections #302

Closed XFactHD closed 7 years ago

XFactHD commented 9 years ago

At completely random times the game crashes with the following crash report: http://pastebin.com/qT2VTYic

The versions of mods mentioned in the stack trace: ImmersiveEngineering: 0.5.2 ThermalDynamics: [1.7.10]1.1.0-161 AE2: rv2 stable 10 Zetta-Industries: 1.1-97

mindforger commented 9 years ago

i would say either the render thread that renders the wire connections via ClientUtils.renderAttachedConnections(...) or something that calls TileEntityImmersiveConnectable.getDescriptionPacket() is clashing with the NetHandler

does your "random" imply moving your head or moving into a cabled area? or probably while saving your world?

Or probably changing Dimensions ? Check your log before the crash if you encountered unloading dimensions right before thew crash probably?

@BluSunrize probably modify getMultimap() from your NetHandler to check the usage of your directConnections before doing a put-operation and delay the put-action in preTick on the next run, but this will probably still fail for the renderer ... maybe you can check for asynchonous access and hand out a copy instead o fthe actual list when someone asynchronously calls getMultimap() ... just to be save against API abuse

XFactHD commented 9 years ago

if your first assumption is right, this would probably point out an error in ThermalDynamics, which I doubt is the case. I would say your second assumption is right because before it crashed the first time, I attached some wires to HV Relays and after the crash they were gone. I can't even replace them, even though I replaced all the HV Relays.

XFactHD commented 9 years ago

@mindforger With random I mean that I am just doing something completly different than cabling in the area where I have many cables. The world I am currently building in is a City built in creative mode and it is supposed to be a bit realistic, which means I didn't change dimensions at any time. The only dimension unloading happened when the game crashed.

mindforger commented 9 years ago

okay scratch my thoughts about the actual connection-list and asynchronous calls insde IE, the renderer is working with a copy already facepalm\ and the only thing i can imagine now is something that is calling getIndirectEnergyConnections either asynchronously to some process parallel calling getConnection or maybe some sort of cross mod interaction via the RF API that causes asynchronous interaction ... guarding the put in getMultimap would probably be the best solution here as i can not see other than this particular put, that may interfere with the getIndirectConnections EXCEPT anything is happening to the actual cable connections, which you stated, you were not doing

@XFactHD do you have by chance any kind of monitoring or something that is actively interacting with the wire network (something linked to an connector) that is not a machine or an energy buffer ... i can remember RF Tools doing some odd stuff with RF API

XFactHD commented 9 years ago

I don't even have RFTools in the pack. In terms of monitoring: I am using the multiblock battery from zetta industries which outputs into a cryo-stabilized fluxduct which then goes into a RF P2P tunnel. This outputs directly into a RF Meter from zetta industries which again directly outputs into a P2P tunnel which outputs again into cryo-stabilized fluxducts which are directly connected to hv connectors. The funny thing is that everything I mentioned now is also mentioned in the crash report, almost in the order.

mindforger commented 9 years ago

indeed i noticed that also, but i have an idea already, i am making a version of IE for you(if you are willing to test), which will keep track of the access to the connection maps and in case of an CME will bloody spam your console with all open access to the map and every time it adds an connection or a new map it will spam all concurrent accesses so we could probably nail down the culprit, but unfortunately not today or maybe in the late evening i will have time to work on the code :P

you mentioned that you had wires that disappeared but you never replaced them, this could possibly be connected to this issue as well, i'd really like to try out with the additional logs

XFactHD commented 9 years ago

I am completely willing to test your build to find the culprit. Especially because the big power lines that go through the country would look weird if I built them with fluxducts 😝. But before that I have to solve something completely different. The version of inventory tweaks I am using spams my log with names of the inventorys and their corresponding containers I am opening.

mindforger commented 9 years ago

okay i setup everything, however i can upload it not yet but in a few hours

@XFactHD you could probably reduce the indirect linking by now and then inserting a "substation", so the cable is somewhat interrupted .... probably a transformerstation or something optical beautiful :D

@BluSunrize i possibly already found the culprit! in public List<Connection> getConnections(World world, ChunkCoordinates node) you call getMultimap(world.provider.dimensionId).get(node); and the documentation states you will return an empty list when the key (node) is not present may this mean you implicitely add this key then, as all returned lists are strong linked? Thus causing the CME on a call on an not existing node (however this is happening)? Here is my idea for a workaround ArrayListMultimap<ChunkCoordinates, Connection> map = getMultimap(world.provider.dimensionId); List<Connection> ret = new ArrayList<Connection>(); if (map.containsKey(node)) { ret = map.get(node); } return ret;

EDIT: i hate github markdown

XFactHD commented 9 years ago

@mindforger I found out why I couldn't reconnect the wires: it's because of the ray tracing bug fixed in one of the latest commits.

XFactHD commented 9 years ago

Didn't mean to close it. 😁

XFactHD commented 9 years ago

@mindforger What do you mean by indirect linking?

mindforger commented 9 years ago

https://drive.google.com/folderview?id=0B7tq3ml4jym9fi1STFRaLXZOd0h6c3ZLWjJEWjIycEJpZmNvd0trMHZ6NWd1Znd2eVc5V1k&usp=sharing but this will be only available for a short time, tell me when you got it

it has a few lines added that will spam you log quiet a bit with call stacks and a pretty big one when it crashes

XFactHD commented 9 years ago

I don't have the permission to acces it.

mindforger commented 9 years ago

fixed, sorry

XFactHD commented 9 years ago

Still doesn't work. Did you set the permission to a specific e-mail adress?

mindforger commented 9 years ago

i hate google https://drive.google.com/file/d/0B7tq3ml4jym9dnVJVjRHaVpIOVE/view

XFactHD commented 9 years ago

got it, will test it in about half an hour.

XFactHD commented 9 years ago

I am still testing and I will try to get it to crash, but for now, nothing shows up in the logs.

mindforger commented 9 years ago

try searching for my name, i added it to the top of every line to make search more easy hm it looks like the log level is getting swallowed ... should've used warning instead

XFactHD commented 9 years ago

I crashed a moment ago with this crash report: http://pastebin.com/sAwRsKLd, but I can't find your name in it and when looking at the relevant classes in the jar with a bytecode viewer, I also can't find any changes.

mindforger commented 9 years ago

yeah my fault, i am redoing it with the IELogger on Warn Level, looks like my logpipe was broken :P i,ll update this comment the moment i got it uploaded, so wait for it please :)

mindforger commented 9 years ago

https://drive.google.com/file/d/0Bx7AtwYG5UtKc01vZ2hWbjlyNjQ/view?usp=sharing there you are

this one should print a little less, but watch the special messages i added to prove my guess :P

XFactHD commented 9 years ago

got it, will test again now

XFactHD commented 9 years ago

I crashed again with your second build. This time it happened while I tried to connect a telecommunication wire from zetta-industries. I connected it to the first connector, went some blocks away and shortly before I clicked the second connector, it crashed. http://pastebin.com/HAnUGwdE

mindforger commented 9 years ago

check your log, i did not manipulate the crash log i printed it as error to the log ... but this time it crashed on another line

XFactHD commented 9 years ago

I couldn't find anything that mentioned ImmersiveEngineering outside of the crash report part of the log.

mindforger commented 9 years ago

augh ... well i missed this one becasue i expected the other line to explode ... i got to go to bed now .. work is calling, i'll make a more spamming one tomorrow, sorry :X

XFactHD commented 9 years ago

Thank you for help anyways, I got to go to bed soon, too.

XFactHD commented 9 years ago

@mindforger Sorry for not answering the last couple days. I am at the moment unable to reply/comment on Curse because Curse spits out an error when trying to send. I will test if this issue is still existing in the latest version of IE and if it is world specific or not (I crashed for the first time with 0.4.4 in my creative world and in my survival world I didn't get any problems).

mindforger commented 9 years ago

sorry i was ill the past couple of days, i forked this repo properly and added you as collab, we can discuss threre as we may start to annoy BluSun with this conversion already :P please try to get in contact with me as i told you on curse, so we can test a bit more effective

BluSunrize commented 9 years ago

tbh, I've been ignoring this one a little, since I personally am out of ideas :/

mindforger commented 9 years ago

@BluSunrize i sorted out the callstack a bit and noticed the tick not beeing processed like it should, or i may misunderstand why it is called by invoke() at cpw.mods.fml.common.eventhandler.ASMEventHandler_1095_MultiblockServerTickHandler_onWorldTick_WorldTickEvent.invoke(.dynamic) at cpw.mods.fml.common.eventhandler.ASMEventHandler.invoke(ASMEventHandler.java:54) at cpw.mods.fml.common.eventhandler.EventBus.post(EventBus.java:138) at cpw.mods.fml.common.FMLCommonHandler.onPreWorldTick(FMLCommonHandler.java:268) at net.minecraft.server.MinecraftServer.func_71190_q(MinecraftServer.java:620) at net.minecraft.server.MinecraftServer.func_71217_p(MinecraftServer.java:547

i've read that this sequence is only called when someone appends a task to the server tick handler via scheduleAsyncRepeatingTask() this may cause in this particular chain of calls in a concurrent access to your in/direct connections map as i stated realier and also i tried to log all duplicate access to getConnections and getMultimap, but i missed the exception to output it maybe there is a better way to implement the access to the maps itself through the whole code

EDIT: did i mentioned, that i hate github markup ? :P

Kubuxu commented 9 years ago

Zetta does energy processing in WorldTickEvent in pre phase which is just before the normal TE ticking. It is done so energy for all outputs and inputs is processed at the same time.

It should work as normal TE tick just before everything.

It is called from invoke(.dynamic) as it is how Forge's events work. There is class generated in runtime to wrap the event so native performance can be achieved.

Only thing I can think of that might cause CME is connection addition/change/update while energy is being processed.

As you can see energy goes from ZettaIndustries' BigBattery via AE's p2p, which is transparent, via ZI' RFMeter, which is also transparent, via TD's Energy Duct to connector where it crashes.

mindforger commented 9 years ago

the only possible "changes" i noticed so far is when loading chunks, in whatever thread this is done, but this would probably crash all the time when not done in same context like tick?! AND whenever you try to get a connection for a world/dimension that yet has no networks (see getMultiMap() on top of IENetHandler) AND (which i think is the actual culprit) during getConnections on an unknown Node!

ArrayListMultimap<String, String> map = ArrayListMultimap.create(); map.put("A", "this is A"); map.put("A", "this is also an A"); System.out.println("map: "+map); List node = map.get("A"); System.out.println("known node 'A': "+node); List unknownNode = map.get("B"); System.out.println("unknwonNode 'B' empty: "+unknownNode); System.out.println("map with empty unknown node 'B': "+map); unknownNode.add("something new"); System.out.println("unknwonNode 'B' nonEmpty: "+unknownNode); System.out.println("map with nonEmpty unknown node 'B': "+map);`

this results in

map: {A=[this is A, this is also an A]} known node 'A': [this is A, this is also an A] unknwonNode 'B' empty: [] map with empty unknown node 'B': {A=[this is A, this is also an A]} unknwonNode 'B' nonEmpty: [something new] map with nonEmpty unknown node 'B': {A=[this is A, this is also an A], B=[something new]}

the moment i added an entry to the unknown node, i implicitely added the unknown node to the multimap, probably causing this CME!

i try to add very messy logging code to my fork this afternoon, but i also have to setup a new server hopefully i can trace down who is adding nodes while the preTick is running

mindforger commented 9 years ago

changing getConnections to check for existence of the node before returning the empty list would probably guard against the CME but who will ensure that the probably new added connections will be put back into the multimap when needed ?! probably a post tick event can be used for that ?!

i mean NEVER doing any updates to the actual map while in tick and only updating the whole connections in post tick could probably be a solution? something like enqueue all changes and handle them once in post tick or something?

BluSunrize commented 9 years ago

Sounds complex, ouch. Will look into it for 0.6 I guess .~.

mindforger commented 9 years ago

seriously isn't there ANY "onCrash" Event or something like that, i want to dump all open calls to getconnection and getMultiMap when it crashes to have an overview of who and why without spamming the log all the time

XFactHD commented 9 years ago

You could perhaps use the onServerStopping event.

mindforger commented 9 years ago

@BluSunrize XFactHD and me did some testing this WE and we both found some issues causing this very uncommon error: i found a possible memory leak regarding getConnections(), you are immediately returning getMultiMap().get(node), which for unconnected HV relay f.E. generates on every render tick an orphaned list, which can add itself to your multimap whenever an entry is added. This may cause memory leaks and heavy GC load, you should probably fall back to check for absence of the key in the map and return null instead this implicitely generatd list. (and don't forget the NPE checks :D )

to XFactHD's discovery, he discovered soemthing in how you override the renderDynamic methode which causes in his particular case an CME because Forge Relocation is ASM hooking the tileEntity with mrtjp.relocation.ASMHacks$.renderTileEntityAt(hacks.scala:30)

but he will add his explanation soon

XFactHD commented 9 years ago

My discovery while testing with @mindforger "ClientUtils" "renderAttachedConnctions" iterates over the "ConnectionList"

"ClientEventHandler" "renderAllIEConnections" calls "renderAttachedConnections" from "ClientUtils"

"TileRenderImmersiveConnectable" extends "TileRenderIE" overrides "renderDynamic" to call "renderAllIEConnections" from "ClientEventHandler"

"TileRenderIE" extends "TileEntitySpecialRenderer" overides "renderTileEntityAt" to just render and not bind the texture

"TileEntitySpecialRenderer" ASM patched by "ForgeRelocation" to make it possible to move blocks without changing the block texture to a general texture This might have an impact on performance of the renderer making IEs stuff n' things asynchronous

Another thing I found at a couple minutes ago is as follows: I wondered why my Railcraft locomotives moved wierdly, then I typed in "/cofh tps" and that's what I got: 2015-08-24_16 41 22

Then I used Opis and got this: unbenannt (The "NODE LEAKED" entrys are in the console because we added some debug functions)

This is the affected HV Relay: 2015-08-24_16 21 21

EDIT: As soon as I connected the ends of the network in a way that created loops, the tps was back at 20 tps.

BluSunrize commented 9 years ago

So ForgeRelocation is /once again/ causing issues with IE. Good lord I despise that mod, because it also makes my Multiblocks moveable.

In any case, I started the transition to ISRBH wire rendering. If you check the latest commit(s) you will see that my attempt at rendering has changed completely, and hopefully for the better. @mindforger @XFactHD, I'm really really sorry you fellas sunk so much time and effort into debugging code that I have decided to phase out, and I appreciate it a lot. To be honest, I don't know how you do all this, since I know nothing of Opis and stuff. I would very much appreciate your guys continued input on the matter, as my "Experts of making IE not lag to hell" x3

XFactHD commented 9 years ago

@BluSunrize What does not really make sense is that when the wires make a loop, the tps is at 20 but when they are not creating a loop or as soon as I disconnect them, this particular hv relay brings the tps down to 6,14 and the actual lag happens on the client as you can see from the tab in Opis.

In terms of continuing input on this matter, I am happy to be able to help and, also, I am learning more about MC modding, java, etc. and I don't plan to drop the participation :).

mindforger commented 9 years ago

@BluSunrize we discovered by some testing i made earlier that porbably the calculation of indirect connection could cause this. all connectors in this picture were connected by cryo flux ducts and as far as i understand those behave transparent, making indirect connections between the connectors ... whenever there is an uneven number of shared connections between a source and a drain i noticed strange power behaviour and also an FPS drop, we worked it around by adding an resonant energy cell as some sort of flow director inbetween the cryo flux duct and the connector

i had those issues with ender io conduits when not properly setup to be explicitely in or output, resulting in surprising connection loops

also as i stated in other issues already, the issue with geconnection still persists, even when the rendering is off the hook, it looks like there are other unschronoized access possibilities.

i stay with getConnection() to return null when there is no node (as you can not add an empty list to the multimap ... blame on guava) breaking your API again PLUS adding synchronized to all accessor of the multimap

BluSunrize commented 9 years ago

@mindforger @XFactHD I assume when you write getConnection() you're talking about ImmersiveNetHandler.getConnections(World world, ChunkCoordinates node)? Yeah that is widely used, if I return null there, now that's a massive API rewrite. I won't start that until I have confirmation that you mean that method ^^ Also, you two should hop on IRC sometime, since this seems to exceed my java skills :/

mindforger commented 9 years ago

yes this is the exact methode, i'll dial into IRC this evening, in the meantime, try the snippet of code i posted eralier to get an image of the problem with the call of getMap(...).get(node) with a node unknown to the map

mindforger commented 9 years ago

done ... i replaced the fragile code with concurrent safe, but i did not test it yet and this may cause other strange behaviours, but this is failsafe by late update, whenever you make a change in dirct or indirect connections the update will only occure in creation of a new iterator, all methodes will continue with the state as they started

https://github.com/mindforger/ImmersiveEngineering/commit/d7b052087943fd6ae30dba507b7cbf7f33cfe34d

PS i had to remove your sort function, but i left a space open to define a comparator by your wishes, that will make your ConcurrentSkipListSet autosorting

BluSunrize commented 9 years ago

I have changed the net handler to use the SkipLists. Thank you guys =P