Closed Delta-99 closed 7 years ago
Are you event handlers persistent?
I have a very good trick in my sleeve to test if you problem is with event handler persistency... To test this, will create a branch for you, and will change the weak tables in the event dispatcher to all strong tables. Then you rerun your test and i am sure the events will be handled.
As we discussed on slack, i think we should check a bit deeper in your code to identify the issue...
Is a simple mission not sufficient to simulate something, well then that is an issue, because i don't know what to search for then.
Let's try to find the root cause together. I am just afraid a quick fix for this kind of problem does not exist... We will need to work together to try to isolate the problem and then discuss possible solutions, tasking into account other dependencies. Test all those dependencies and then move to alpha test.
Cannot commit to any timelines though... This will take time.
Looking at the above, did you reapply the event handler after you quit your mission and then fly again? Once you change slot, the event handling is gone too for clients, otherwise the dispatcher would dispatch events to objects that don't exist.
@CraigOwen do you have problems with event handling? You made a test mission that shows a complete landing event game... Does that still work?
Just to add, this was SP not MP. Not slots changing, events are on AI groups only. Mission was quit completely and restarted.
The event handlers SHOULD be persistent as everything else with the group that they are on is persistent.
I agree this will NOT be a simple thing to even track down or reproduce. But if anyone has time then just take any mission with event handlers and run it the way I am saying. If it exhibits the same behaviour then we can reproduce. If not, then we have to dig deeper.
You mean... Quit mission and then retry?
You mean... Quit mission and then refly? That is WEIRD. And why would that be a showstopper? I'll try this when home.
Read my steps above in the initial post. If they aren't clear ask me.
@FlightControl-Master, no rush on this but if we cannot reproduce this with a simple mission then lets attempt to schedule a time that we can both look at this together with screen sharing or something. I can show you it happening in my mission if I still have it setup the same way by the time we look at it and if it is still an issue at that time!! I cannot rule out yet if it is something stupid I am doing in my mission OR like you are stating above about a trick to try.
@FlightControl-Master, what does this mean in the log. It looks to me that possibly after these lines the Event Handlers in my code are not called. They were firing before these lines in the log and then not after. I hope this might help to pinpoint something:
00085.167 INFO SCRIPTING: 4033( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 21)
00085.167 INFO SCRIPTING: 4033( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 23)
00085.167 INFO SCRIPTING: 4033( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 25)
The thing that is interesting is that I have 3 OnEvent handlers defined. I wonder if that corresponds to these 3 things being obsoleted.
On further testing if I get even just one of those Scheduled obsolete calls above then any Event Handlers after that are not being called. This seems to happen randomly and doesn't happen every time I run the mission. Seems to happen pretty early in the mission as well without really much going on other than planes being spawned etc. So in other words I don't think it is something I am doing in my code.
I have been investigating this issue and could resimulate it. I have an easy fix and a very difficult fix for this. The easy fix will fix event handling for most appliances, but not for TASKING... So, if I do the easy fix, tasking will work, but issues will come up with event handlers that should not be called... The not so easy fix is for me to dive into my code, and make the weak table implementation work as expected... There current implementation will loose handlers when collectgarbage() is called in lua. Tanks @Delta-99 for your perseverance and reporting on this one. Fixing this will take a bit of time though... Need time to get the weak table on the EventClass working, and it does not for some reason...
Wow, FC, you actually can reproduce this? Great job. I was worried this would not be found for a long while but I think you had ideas that it could be this weak table stuff.
I so go for the "not so easy fix" even if it takes longer. But it sounds like you may not have a real solution yet or something that works? Anything I can help with in this regards? Is there something I can read about weak tables?
I have a real solution. I just need to get the bloody lua clean the table reference, but there is some link somewhere "hanging"... So i need to go search for that hanging reference that prevents the garbage collector cleaning the weak tables ...
If you wanna learn about weak tables, check http://lua-users.org/wiki/WeakTablesTutorial. but early warning, it seems easier than it really is. garbage collection is crap. Any unwanted link somewhere prevents the garbage collector doin its job. It doesn't tell you of the hanging links, and it does not provide any debugging tools to see which internal memory references are with tables ... I got it right on SCHEDULERDISPATCHER, now EVENT is causing me troubles... It is very annoying. I really would like a "destructor" like in C++ in lua. Then you KNOW the object is cleaned, even if it would cause a memory crash.
I designed a complete test mission to test the garbage collection on the Event dispatcher. Now that i found the bugs, i need to get them resolved, but step one is done now .... It can only get better from this point on!
I've created two branches: master-405-event-handling both in MOOSE_MISSIONS and MOOSE repositories.
lua 5.2 implements finalizers, which is a step into the right direection. The makers of lua knew this was an issue,, but when i ask on the ED forums if there is any time that DCS will upgrade to lua 5.2, i get no useful reaction at all. The post is here: https://forums.eagle.ru/showthread.php?t=177732
This is the code of the test mission:
---
-- Name: EVT-001 - UNIT OnEventShot Stability Test
-- Author: FlightControl
-- Date Created: 9 Apr 2017
--
-- # Situation:
--
-- A couple of planes are firing to each other. Monitor the shot events.
-- I am doing a collectgarbage to test the stability of the event handling.
-- Also when the planes are destroyed, the event handling should stop etc.
-- The tests are on GROUP level.
--
-- # Test cases:
--
-- 1. Observe the planes shooting the missiles.
-- 2. Observe when the plane shoots the missile, a dcs.log entry is written in the logging.
-- 3. Check the stability of the event handlings.
PlaneGroupsBlue = {}
PlaneGroupsRed = {}
PlaneSpawnBlue = SPAWN
:New( "Planes Blue" )
:InitLimit( 2, 0 )
:SpawnScheduled( 10,0 )
:OnSpawnGroup(
function( SpawnGroup )
SpawnGroupName = SpawnGroup:GetName()
PlaneGroupsBlue[SpawnGroupName] = SpawnGroup
PlaneGroupsBlue[SpawnGroupName]:HandleEvent( EVENTS.Shot )
PlaneGroupsBlue[SpawnGroupName]:HandleEvent( EVENTS.Hit )
collectgarbage()
PlaneGroupsBlue[SpawnGroupName].OnEventShot = function( self, EventData )
self:F( EventData )
self:MessageToAll( "I just fired a missile!", 15, "Alert!" )
end
PlaneGroupsBlue[SpawnGroupName].OnEventHit = function( self, EventData )
self:F( EventData )
self:MessageToAll( "I just got hit!", 15, "Alert!" )
PlaneGroupsBlue[self:GetName()] = nil
end
end
)
PlaneSpawnRed = SPAWN
:New( "Planes Red" )
:InitLimit( 2, 0 )
:SpawnScheduled(10,0)
:OnSpawnGroup(
function( SpawnGroup )
SpawnGroupName = SpawnGroup:GetName()
PlaneGroupsRed[SpawnGroupName] = SpawnGroup
PlaneGroupsRed[SpawnGroupName]:HandleEvent( EVENTS.Shot )
PlaneGroupsRed[SpawnGroupName]:HandleEvent( EVENTS.Hit )
collectgarbage()
PlaneGroupsRed[SpawnGroupName].OnEventShot = function ( self, EventData )
self:F( EventData )
self:MessageToAll( "I just got hit!", 15, "Alert!" )
end
PlaneGroupsRed[SpawnGroupName].OnEventHit = function( self, EventData )
self:F( EventData )
self:MessageToAll( "I just fired a missile!", 15, "Alert!" )
PlaneGroupsRed[self:GetName()] = nil
end
end
)
collectgarbage()
BASE:E( "Collected garbage" )
Dumb question! Is LUA doing the garbage collection automatically or does it only do it when the garbagecollection() is called? What are the cons of not doing it? I assume possible memory leaks. But would a potential short term solution be to add an option to Moose to turn off garbage collection? Would this solve the problem?
Garbage collection is done automatically at unclear times. But you can trigger it with collectgarbage(), which i use to test. The issue is that for publish subscribe mechanisms like EVENTs or SCHEDULERs objects, when no weak tables are being used, and a child object was "subscribed", read linked in these publishing objects, that when you "nil" your child object, the publish objects would still hold the reference and although you "think" your object is nil, the publish objects still have a valid reference to the variable that gave the address of the object... The links will NOT be removed from memory... So... In case of FSMs, this is even worse... FSMs registered in a publishing table will continue to exist and work, although you think they are destroyed, because you nilled them and garbage cleaned them... You can subscribe / unsubscribe objects too, but that has dangers of it's own, and, on top, the FSM and especially sub FSMs have becomes rather complicated, to unsubscribe each process from its parent is a tedious task without destructors...
Again, I know this is not a long term solution but for those that are not worried about memory leaks in their missions (maybe shorter SP missions) what about this:
collectgarbage("stop")
Found from here: http://luatut.com/collectgarbage.html
I can try putting this at the top of my mission and see if it then doesn't stop handling events. Or better yet maybe put it into your test mission.
Memory leaks is not a concern. Just let it rest, I'll fix it.
No worries, I was just trying to come up with a short term quick fix for people until you get a chance to fix this.
There is a very quick term fix, but you advised there long term (we talk here a couple of days)...
Just pushed a fix in master-405-event-handling branch. This is the "long-term" fix. Maybe you can help checking out the fix in your mission.
I might be able to do that today.
Note that this is a pre-alpha just my mission tested fix, meaning, work in progress (WIP). I am reworking the event dispatching a bit... To ensure that:
This will keep the subscriptions fresh and free from hanging subscribers...
And I removed this silly weak table __mode="v" thing, causing subscriptions to "suddenly" dissapear.
I need to test these changes thoroughly, in every scenario possible... Including tasking, and task finishing (so when the process is finished, or when the player exists the task etc...) FSMs may not "hang" in the subscription list.
I added a few self:E lines, to debug my code... So you may see some overdose or added tracing in the log...
The good part is the new release system in master, no more moose.lua (re-)generation if no file is added/replaced. You can quickly change branch and test, (Do this BOTH on MOOSE_MISSIONS and MOOSE repositories). Thanks for the help @Delta-99 !!!
@FlightControl-Master just want to be clear about this:
GROUP and UNIT event handlers are removed being subscribed when the underlying Group and Units are dead (the name cannot be determined anymore).
When you mean dead you mean the Moose GROUP object is dead right or nil? Not when the Group or Unit in mission is dead. Because in the second case one might still have reason to use the Moose Group object.
Just did a new commiit in the branch, with a new version...
To answer your question: the wrapper classes are staying in memory, but the underlying Group or Unit (note the small capitals) can be destroyed. So, when the underlying objects are destroyed, the subscriptions finish for those GROUP or UNIT objects (still need to test the Crash, Dead and Birth events through )... The GROUP and UNIT objects are not nilled!!! Does this address your concern?
I have reworked event handling: -- Avoid events not being handled whey they should. -- Clean up the subscriptions when Groups or Units are dead. -- Reinitiate the subscriptions when Groups or Units are respawned. -- EVENT_HIT is only for Targets when the subscription is on UNIT or GROUP level. -- MISSION_END should work now too ... -- When a subscribed object is nillified, and the collectgarbage() is executed, it should clean the subscription. -- Reworked and cleaned the event handling... -- Cleaned up the code
Pull request #416 is ready for your review ....
@FlightControl-Master very first test things are looking pretty good. All Events seem to be firing (even Land which never fired before I don't think - although could be my mission always broke before I ever got to a Land event). I do see some of those scheduled obsolete calls but they didn't seem to cause an issue. They seemed to come up during an AI_BALANCER monitoring run:
00333.673 INFO SCRIPTING: 577( -1)/T: AI_BALANCER00546.FSM Transition:None --> Monitor --> Monitoring 00333.673 INFO SCRIPTING: 1199( 620)/T: AI_BALANCER00546.Calling onenterMonitoring 00333.673 INFO SCRIPTING: 216( 244)/F: SET_GROUP00544.Get(RU Client #004) 00333.673 INFO SCRIPTING: 289( 503)/T: AI_BALANCER00546.Client RU Client #004 not alive. 00333.673 INFO SCRIPTING: 294( 503)/E: AI_BALANCER00546.IteratorFunction(New AI Spawned for Client RU Client #004) 00333.673 INFO SCRIPTING: 216( 244)/F: SET_GROUP00544.Get(RU Client #001) 00333.673 INFO SCRIPTING: 289( 503)/T: AI_BALANCER00546.Client RU Client #001 not alive. 00333.674 INFO SCRIPTING: 294( 503)/E: AI_BALANCER00546.IteratorFunction(New AI Spawned for Client RU Client #001) 00333.674 INFO SCRIPTING: 216( 244)/F: SET_GROUP00544.Get(RU Client #003) 00333.674 INFO SCRIPTING: 289( 503)/T: AI_BALANCER00546.Client RU Client #003 not alive. 00333.674 INFO SCRIPTING: 294( 503)/E: AI_BALANCER00546.IteratorFunction(New AI Spawned for Client RU Client #003) 00333.674 INFO SCRIPTING: 159( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 23) 00333.674 INFO SCRIPTING: 159( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 25) 00333.674 INFO SCRIPTING: 159( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 27) 00333.674 INFO SCRIPTING: 159( -1)/E: SCHEDULEDISPATCHER00002.function(Scheduled obscolete call for CallID: 29) 00343.091 INFO SCRIPTING: 577( -1)/T: AI_BALANCER00546.FSM Transition:Monitoring --> Monitor --> Monitoring
I will continue to test as I'll be modifying my mission tonight and running it multiple times.
After a couple more hours of testing after my last post things are looking really good. Events all firing all the time as far as I can tell.
Released to master, this is fixed by pull request #416
Thanks @Delta-99 foe testing!
@FlightControl-Master is not going to like this one :(
I can confirm with certainty that DCS Events captured by Moose are not always firing their Moose Event handlers defined in mission lua code.
This is going to be a very tough one to track down I'm sure. I am sorry I do not have a simple mission example I can submit BUT I would bet if you took any of the demo missions and ran them enough times one would get this behavior. Maybe someone can explain it.
Steps to reproduce.
Note, I believe the first line is MOOSE internally capturing the DCS Takeoff Event. The next 4 lines is Moose determining to call the event handler defined in the mission.
I was able to do this consistently in my mission. I ran about 6 times. It looked to me like every other time no event functions were called in my code. So it worked 1st run, did not work 2nd run, worked 3rd run, did not work 4th run, etc etc.