freeorion / freeorion

Source code repository of the FreeOrion project.
https://www.freeorion.org
850 stars 163 forks source link

AI exception when loading save game - assign_military_fleets_to_systems #1794

Open dbenage-cx opened 6 years ago

dbenage-cx commented 6 years ago

Environment

Description

On load of save game posted in #1778 (from turn 129):

AI_4: AI_Error: AI script error : FreeOrionAI.py:generateOrders():333  - Exception 114698 while trying to assign_military_fleets_to_systems
Traceback (most recent call last):
  File "/opt/projects/freeorion/default/python/AI/FreeOrionAI.py", line 330, in generateOrders
    action()
  File "/opt/projects/freeorion/default/python/AI/MilitaryAI.py", line 803, in assign_military_fleets_to_systems
    fleet_list=found_fleets)
  File "/opt/projects/freeorion/default/python/AI/FleetUtilsAI.py", line 150, in get_fleets_for_mission
    fleet_pool_set.remove(fleet_id)
KeyError: 114698

Expected Result

No exception

dbenage-cx commented 6 years ago

@freeorion/ai-team Following diff allows load of saves games without an AI error for me. Not certain if potential fix or temporary workaround.

diff --git a/default/python/AI/FleetUtilsAI.py b/default/python/AI/FleetUtilsAI.py
index 59b4462da..7b541d101 100644
--- a/default/python/AI/FleetUtilsAI.py
+++ b/default/python/AI/FleetUtilsAI.py
@@ -147,7 +147,8 @@ def get_fleets_for_mission(target_stats, min_stats, cur_stats, starting_system,

             # all checks passed, add ship to selected fleets and update the stats
             fleet_list.append(fleet_id)
-            fleet_pool_set.remove(fleet_id)
+            if fleet_id in fleet_pool_set:
+                fleet_pool_set.remove(fleet_id)
             this_rating = foAI.foAIstate.get_rating(fleet_id)
             cur_stats['rating'] = CombatRatingsAI.combine_ratings(cur_stats.get('rating', 0), this_rating)
             if 'ratingVsPlanets' in target_stats:
Morlic-fo commented 6 years ago

Your diff does silence the error but I do not think that is a very good idea to do that. The error is a hint that there is a significant problem in the underlying code either on server side or within the AI scripts assumptions.

I pushed a commit to catch the KeyError and hopefully let the AI more gracefully end its turn but it will still show an error message in the chat for now.

Dilvish-fo commented 6 years ago

I think I have mentioned elsewhere, it appears to me that the underlying problem has to do with the exposed function for splitting ships out of fleets is not working quite as expected, not leaving the expected ship as the remaining ship in the original fleet. This is one of the bugs I had been planning to finish chasing down for 0.4.8, but as I have been even busier lately than I had expected, I haven't gotten to that yet.

My plan was to further review the fleet splitting function to see if there is a bug there, and otherwise adding some code for the AI to properly determine the which ship actually got left remaining in the original fleet rather than making the assumption it currently makes. I will try to get to that still, or if either of you have the time and inclination to follow up further I'd invite you to try that path.

Dilvish-fo commented 6 years ago

to clarify/correct the above a bit more, the call to split the fleet up, https://github.com/freeorion/freeorion/blob/b7ea82aa8f96c635ee3f6af984ba6ea9b2fdf1cf/default/python/AI/FleetUtilsAI.py#L121 assumes that the original fleet ID will not be in the list of returned IDs returned for newfleets. That expectation is because of the manner of looping over ships at https://github.com/freeorion/freeorion/blob/b7ea82aa8f96c635ee3f6af984ba6ea9b2fdf1cf/default/python/AI/FleetUtilsAI.py#L203, which skips over the first ship in the list intending to leave it in the original feet. However, it seems that at some point, when splitting off one of the other ships, that first ship is transferred instead of one of the later ships, or at least rather the original fleetID gets returned there. Or something like that, something squirrelly happening with that process, somehow leading to the original fleetID being included in the returned list of newfleetIDs. I think that's the point where I ran out of time the last time I tried chasing this down, and I set it aside there...

dbenage-cx commented 6 years ago

iirc when splitting a fleet, new fleets are created for each grouping. The ship left in the old fleet would actually be moved to a new fleet, with the old now invalid.

Dilvish-fo commented 6 years ago

iirc when splitting a fleet, new fleets are created for each grouping. The ship left in the old fleet would actually be moved to a new fleet, with the old now invalid.

You are perhaps thinking of when the UI splits an entire fleet. The transfer being done here is more like when you just transfer a few ships from an old fleet onto the 'new fleet' bar, the old fleet is not invalidated if only a portion of the ships are transferred. Tracing from AIInterface.cpp to Order.cpp, it looks to me this the AI code here is using a mechanism in which the old fleet should only be invalidated if it becomes empty from the transfer, which should not be happening here.

dbenage-cx commented 6 years ago

I was thinking of UI, thanks for catching.

Spent a little time trying to trace this is bit, but not been able to reproduce with just 1 ai so far. Save game and logs(9 ais) The relevant ai log file is AI_1.log

The underlying bug looks like a possible issue related to IDallocator, as the next valid id for this empire is for an already existing fleet. When a new fleet order is issued, the new fleet is given such an id, and the ai loses some tracking of the ship it thought to leave in original fleet.

Catching such cases on AI side:

diff --git a/default/python/AI/FleetUtilsAI.py b/default/python/AI/FleetUtilsAI.py
index fddcfaf23..8b4058164 100644
--- a/default/python/AI/FleetUtilsAI.py
+++ b/default/python/AI/FleetUtilsAI.py
@@ -120,6 +120,8 @@ def get_fleets_for_mission(target_stats, min_stats, cur_stats, starting_system,
             if len(list(fleet.shipIDs)) > 1:
                 new_fleets = split_fleet(fleet_id)
                 fleet_pool_set.update(new_fleets)
+                if fleet_id in new_fleets:
+                    new_fleets.remove(fleet_id)
                 fleets_here.extend(new_fleets)
             # check species for colonization missions
             if species:
@@ -202,8 +204,13 @@ def split_fleet(fleet_id):
     if len(list(fleet.shipIDs)) <= 1:  # fleet with only one ship cannot be split
         return []
     ship_ids = list(fleet.shipIDs)
-    for ship_id in ship_ids[1:]:
+    ships_to_split = ship_ids[1:]
+    for ship_id in ships_to_split:
         new_fleet_id = fo.issueNewFleetOrder("Fleet %4d" % ship_id, ship_id)
+        if new_fleet_id == fleet_id:
+            warn("New fleet issued id of original fleet %d, adding ship %d for pending new fleets" %
+                 (fleet_id, ship_ids[0]))
+            ships_to_split.append(ship_ids[0])
         if new_fleet_id:
             new_fleet = universe.getFleet(new_fleet_id)
             if not new_fleet:
Dilvish-fo commented 6 years ago

The underlying bug looks like a possible issue related to IDallocator, as the next valid id for this empire is for an already existing fleet.

Thanks very much for looking into this more. What you describe sounds like such a potentially serious bug that I hesitate to largely sweep it under the rug rather then allow the bug to remain somewhat noisy. (iirc warn simply goes to logfile versus 'error' which also chats to user).

Also, if truly the only erroneous ID that could come up is the one from the original fleet, then that would be a very helpful and intriguing clue; I don't think we should make that assumption (this bug might be far more common than we think but we are discovering it only in this one situation where the code has been sensitive to the new ID being the same as the original fleet ID).

I'll keep thinking about this and poking around a bit, it would be great if you could continue as well.

dbenage-cx commented 6 years ago

I keep coming back to the thought that the ai is trying to process ids which it previously saved, but have not yet been allocated after load. With the above diff, next turn generates errors for other merged fleets

[debug] python : FleetUtilsAI.py:339 - couldn't get fleet with id 55759
[error] ai : AIInterface.cpp:624 : IssueAggressionOrder : no fleet with passed id
[debug] python : FleetUtilsAI.py:339 - couldn't get fleet with id 55940
[error] ai : AIInterface.cpp:624 : IssueAggressionOrder : no fleet with passed id
[debug] python : FleetUtilsAI.py:339 - couldn't get fleet with id 56220
[error] ai : AIInterface.cpp:624 : IssueAggressionOrder : no fleet with passed id
[debug] python : FleetUtilsAI.py:339 - couldn't get fleet with id 56231
[error] ai : AIInterface.cpp:624 : IssueAggressionOrder : no fleet with passed id

Removing the merged from fleet id from systems in AIstate::systemStatus (at least for "myfleets" key) resolves such errors. Can not tell if this is a separate issue until I reproduce the error and can save a valid previous turn.

diff --git a/default/python/AI/AIstate.py b/default/python/AI/AIstate.py
index cc2c4eda2..04c4ebbed 100755
--- a/default/python/AI/AIstate.py
+++ b/default/python/AI/AIstate.py
@@ -255,6 +255,9 @@ class AIstate(object):
             del self.fleetStatus[fleet_id]
         if fleet_id in self.__fleetRoleByID:
             del self.__fleetRoleByID[fleet_id]
+        for sys_id in self.systemStatus:
+            if fleet_id in self.systemStatus[sys_id]['myfleets']:
+                self.systemStatus[sys_id]['myfleets'].remove(fleet_id)

     def __report_system_threats(self):
         """Print a table with system threats to the logfile."""

As supplement to the previous diff wrt split_fleet (or alternative if the original ship is unaffected):

diff --git a/universe/Universe.cpp b/universe/Universe.cpp
index 29c3c0236..7e89acc85 100644
--- a/universe/Universe.cpp
+++ b/universe/Universe.cpp
@@ -373,6 +373,17 @@ std::set<std::string> Universe::GetObjectVisibleSpecialsByEmpire(int object_id,

 int Universe::GenerateObjectID() {
     auto new_id = m_object_id_allocator->NewID();
+    int max_attempts = 10;
+    while (m_objects.Object(new_id) != nullptr) {
+        auto old_id = new_id;
+        new_id = m_object_id_allocator->NewID();
+        ErrorLogger() << "Next object id " << std::to_string(old_id) << " already in use, trying " << std::to_string(new_id);
+        max_attempts--;
+        if (max_attempts <= 0) {
+            ErrorLogger() << "Exceeded tries for new id";
+            return INVALID_OBJECT_ID;
+        }
+    }
     return new_id;
 }

(None of the above are proposed candidates for committal.)

Dilvish-fo commented 6 years ago

Removing the merged from fleet id from systems in AIstate::systemStatus (at least for "myfleets" key) resolves such errors. Can not tell if this is a separate issue until I reproduce the error and can save a valid previous turn.

That does seem to at least largely resolve the problem, at least in my couple of test cases. Some months back I had experimented with nearly the same cleanup, but was trying to be more sparing of the searching and was only doing it in a subset of the cases where I could see the clear need, which had reduced the spam you noted but hadn't fixed this key error issue, and I let it languish.

I am guessing that your statement that "(None of the above are proposed candidates for committal.)" is because you hadn't tested this as thoroughly as you'd like and perhaps were uncertain if this might cause the AI other trouble. Your second diff is looking unneeded to me, but I think this first bit of code, or at least something along those lines, is probably right to put in. I broadened my previous changes in light of your diff above and am going to submit a PR on those to at least get them in here for comment rather than just letting them gather dust on my machine.

Dilvish-fo commented 6 years ago

Unfortunately, I am getting this error again, running recent master e97989e on a Win10 machine, with the following save file fleets_for_mission_key_error.zip

Vezzra commented 6 years ago

@Dilvish-fo, @dbenage-cx, has there been any progress made with this issue here? At least to the point where we can reassign it to optional?

Dilvish-fo commented 6 years ago

I think I've made a small bit of progress, or at least have a branch with some additional debugging to help make progress, but there has been a number of other FO issues chewing up my FO time as well.

I don't think you need to consider this as something that should block the creation of a 0.4.8 branch-- worst case I simply add a commit to that branch to shift the error logging for this issue over to warn level logging, so it does not spam the player with ingame error messages. I am pretty sure that this issue is not causing significant AI problems, and so that logging change should be fine for 0.4.8. (But I don't really want to put that logging change into master.)

Dilvish-fo commented 6 years ago

to clarify somewhat: if this is not fully resolved but we simply add a commit to the 0.4.8 release changing the logging level from error to warn, then at that point I would downgrade this to optional for 0.4.8

Vezzra commented 6 years ago

@Dilvish-fo,

if this is not fully resolved but we simply add a commit to the 0.4.8 release changing the logging level from error to warn, then at that point I would downgrade this to optional for 0.4.8

The release branch has been created, can you proceed with what you suggested here?

Dilvish-fo commented 6 years ago

The release branch has been created, can you proceed with what you suggested here?

OK, done.

Vezzra commented 6 years ago

Thanks!

Vezzra commented 4 years ago

@freeorion/ai-team, what's the status on this? Has there been any progress? Or has the issue been solved already long ago, and we just forgot to close this accordingly?