SteamDatabase / FileDetectionRuleSets

🔎 Rules to detect game engines and other technologies based on Steam depot file lists
https://steamdb.info/tech/
MIT License
185 stars 75 forks source link

Godot games with multiple PCK files #52

Closed akien-mga closed 3 years ago

akien-mga commented 3 years ago

SteamDB link to the game

https://steamdb.info/app/498310/info/ https://steamdb.info/app/1079290/info/

What should it be detected as?

Godot

Additional information

These should be detected as Godot, but we currently exclude these knowingly as we fear a high number of false positives due to old games using .pck extensions for their multiple asset packs (data.pck, sounds.pck, etc.).

I do think that after the last few days' improvements to the Godot heuristics, we should be able to include them provided that they pass the other Godot 3.0+ criteria, i.e. there must be a mygame.pck that matches mygame.exe, mygame.x86_64, mygame.x86 or mygame.

I don't think that there is any pre-Godot 3.0 game using multiple PCKs, so it should be OK to keep excluding data.pck + other PCKs, as this is the case that would lead to most false positives.

Before pushing a potential fix for this, I would recommend sharing a list of what would be the new additions, so that I can review it and ensure that there aren't any false negatives.

larsiusprime commented 3 years ago

Okay, I did some searching for pck files in non-godot games and here is what I have discovered:

  1. MODEM.PCK in old dos games (probably the majority of matches)
  2. Certain idTech5-7 games have a bunch of pck files
  3. A random python library has some pck files in one game
  4. A bunch of Unreal engine games have some pck files

I was told this game was made in PyGame but it actually looks like a dead ringer for Godot based on the file structure: https://steamdb.info/app/388490/

I did a brute force search on SteamDB and there are 943 unique depots that have PCK files in them. If we assume there are maybe ~2 depots per app on average, that's about 471 apps or so, 300+ we know are Godot.

Conclusions:

  1. There is a limited amount of total false positives here (maybe up to 150 or so)
  2. DOS games should be ruled about before the 2 pass test even starts, and I've never seen a MODEM.EXE file anyways
  3. Unreal games should be ruled about before the 2 pass test even starts
  4. I think the chances of idTech, Unreal engine, and python libraries having pck files that match executables is low

Therefore, it is probably very safe to have the 2-pass Godot rule be:

larsiusprime commented 3 years ago

Furthermore, found these games that have a data.pck file -- don't have appids as these are from depot names, but are most of them Godot?

Forsaken World X3: Albion Prelude Broken Age Dear Leader Egg Returns Home Swordsman One Way To Die PWI The Dope Game agoc GRAVEN The Purple Moon Prophecy Raise Your Own Clone X3: Farnham's Legacy Archipelago Satellite Repairman Caliper Deep Sixed Flagsplosion Bouncing Odyssey Bad Government Qequitas Orbis Building Block Heroes Placement The Legend of Slime Chess Arena Mindset Bloom: Labyrinth City Game StudioVillage of Adventureers Final Storm The Geology Game Seek Etyliv my dream Music Boy 3D Endhall HapSquash! ProtoCorgi Demo Ride with the Reaper Sneak in Greed Destroy the Dummies

larsiusprime commented 3 years ago

The most important thing is to exclude Broken Age, imho

akien-mga commented 3 years ago

Most are Godot games, but some notoriously aren't (X3: Albion Prelude, Broken Age, Swordsman, X3: Farnham's Legacy, Village of Adventureers).

But I think all the Godot games in that list are already matched by the current logic, so we may not need to loosen the rules for their sake. I don't think any of those uses multiple PCKs.

I think we can focus on the 3.0+ ruleset only for multiple PCK support.

akien-mga commented 3 years ago

I was told this game was made in PyGame but it actually looks like a dead ringer for Godot based on the file structure: steamdb.info/app/388490

Yeah I'm pretty sure that's a Godot game. This developer has several other Godot games and is the maintainer of the Godot Steam API integration: https://github.com/Gramps/GodotSteam

Might have been PyGame initially though.

MODEM.PCK in old dos games (probably the majority of matches)

DOS games should be ruled about before the 2 pass test even starts, and I've never seen a MODEM.EXE file anyways

I did see a lot of those initially before your recent round of fixes, so they did seem to end up in the 2 pass test. But now that we match those with an exe, it's fine.

larsiusprime commented 3 years ago

Yes, right, so actually because we detect multiple engines they won't be automatically ruled out. I'll add a line to check for these. It's mostly just to be safe for the odd case an unreal game with 500 pcks happens to line up with its executable name

larsiusprime commented 3 years ago

Okay so I improved the detection with the following logic:

After that:

I stopped checking for extensionless executables entirely, as I realized we don't need to account for each executable. .exe + .pck is a rare enough pattern it seems to only happen with Godot, and pre-ruling out other ambiguous engines with many PCK files makes it even more safe.

This makes us mostly check off of the windows depot alone, but it is very rare for someone to ship mac or linux content without windows, and this reduces confusion. I do still check the .x86 and x86_64 extensions for hits as well because why not. But this makes things cleaner & simpler and less error prone. I was able to hit RPGINABOX and many other previously missed Godot games, without triggering false positives in my set of ~1000 test games. Hopefully it doesn't cause false positives on the full set.

Reopen if there's issues.

akien-mga commented 3 years ago

I stopped checking for extensionless executables entirely, as I realized we don't need to account for each executable. .exe + .pck is a rare enough pattern it seems to only happen with Godot, and pre-ruling out other ambiguous engines with many PCK files makes it even more safe.

This makes us mostly check off of the windows depot alone, but it is very rare for someone to ship mac or linux content without windows, and this reduces confusion.

This might make us miss some entries that could only be detected thanks to the Mac build though, if they use "Embed PCK" (so Windows and Linux builds don't have a PCK).

larsiusprime commented 3 years ago

How many of those are there? Show me some examples and I will test them.

akien-mga commented 3 years ago

I don't know exact numbers but probably more than 20.

Luck be a Landlord is a good example: https://steamdb.info/app/1404850/depots/

larsiusprime commented 3 years ago

I already detect Luck be a Landlord with the current rules!

So even though the pck is only included in the mac depot, it happens to have the same name as the windows executable, so it matches. The case where it would fail is if they intentionally name their mac executable/pck pair differently.

akien-mga commented 3 years ago

I thought you planned to only parse the Windows depot? I may have misunderstood.

Luck be a Landlord is a lucky case though, there could also be games with different names for different platforms, e.g.:

MyGameLinux.x86_64   // using Embed PCK
MyGameWindows.exe    // using Embed PCK
MyGameMac.app/Contents/MacOS/MyGameMac
MyGameMac.app/Contents/Resources/MyGameMac.pck

or even

MyGameLinux.x86_64   // using Embed PCK
MyGameWindows.exe    // using Embed PCK
MyGameMac.app/Contents/MacOS/MyGame
MyGameMac.app/Contents/Resources/MyGame.pck

(with different .pck and .app name)

This wouldn't be matched if we no longer do extensionless matching. It seems to be fairly efficient now that @xPaw removed folders, no?

larsiusprime commented 3 years ago

I re-ran the tests and it looks like if we aren't matching against folders, then extensionless matching should probably be okay. Let's see what the full results are.