NearInfinityBrowser / NearInfinity

An Infinity Engine Browser and Editor
https://github.com/NearInfinityBrowser/NearInfinity/wiki
GNU Lesser General Public License v2.1
87 stars 37 forks source link

Expand unused file search logic #184

Open ColossusChang opened 2 months ago

ColossusChang commented 2 months ago

Currently the "Unused File" search only checks whether the files themselves are unused/unreferenced.

However, it would be logical to also consider a file to be unused, if all uses of that file are in files that are themselves unused.

For example, as of v2.4-20240424, in "Baldur's Gate: Siege of Dragonspear v2.6.6.0", LENORE.DLG is not considered unused. However, it is only used in LENORE.CRE. Since LENORE.CRE is unused, LENORE.DLG should also be considered unused.

Argent77 commented 2 months ago

Expanding the feature like this would introduce several issues:

  1. It would increase the complexity of the search exponentially.
  2. It would also produce many false positives since usage of a good number of game resources is hardcoded in the game executable. These resources would be erroneously marked as 'unused' even though they are referenced by the game.
  3. There is no safe way to decide when to stop the search since many chains of resource references will encounter a dead end eventually even though they are accessible in the game (see point 2).

The current implementation may not be exhaustive but it should produce the most accurate result under the given circumstances.

ColossusChang commented 2 weeks ago

Expanding the search to include transitive references doesn't necessarily lead to exponential complexity.

Model the objects (CRE, DLG,...) and their references as graph nodes and vertices. Treat the hardcoded resources as starting points in graph traversal. Maintain a list of these hardcoded resources as an initial set of known "used" objects, and do BFS on the graph and mark resources they reference as "used". It'll boil down to identifying all nodes that are unconnected to known used nodes.

Assuming that there are m resources, and a resource refers to at most n other resources, this will be of time complexity O(mn).

You can run this traversal at startup and store the unused resources in memory, sorted by resource type, and when the user asks to get a list of all unused objects of a certain resource, just return that list. Or if it's still too slow, make it configurable to do the traversal and warn the user about it. With the graph built, it will also be much faster to do the "find references to this file" actions.

Interestingly, acquiring a list of directly unused resources is also of time complexity O(mn). So complexity wise it's actually the same.

ColossusChang commented 2 days ago

@Argent77 Any follow-ups on this? This functionality would be crutial for a mod I've been working on, since knowing which resources are really unused tremendously helps reduce a lot of useless effort.

Argent77 commented 2 days ago

As I mentioned before there a many hardcoded instances of resources that are used implicitly by the engine in one way or another. The engine is not open source, so the known number of hardcoded resources is incomplete and may differ from game to game. Even with the current search method there are chances to find false positives because of ambiguities or unknown engine behavior.

Expanding the feature like this would lead to an increase of false positives - or negatives in this case. (IMHO) this is worse than a couple more hits in the results list which can be clarified by performing addition searches on the results.

It would also involve a substantial rewrite of the current reference search functionality. I have added this feature to my todo list, but I can't say if or when I can find the time to work on it.

Since NI is open source I would happily accept contributions for this or other features though. The Contributors section of the readme provides more details about code conventions, etc.