SteamDatabase / FileDetectionRuleSets

🔎 Rules to detect game engines and other technologies based on Steam depot file lists
https://steamdb.info/tech/
MIT License
186 stars 78 forks source link

Many games after recent Godot changes #46

Closed akien-mga closed 3 years ago

akien-mga commented 3 years ago

SteamDB link to the game

From pastebin that @larsiusprime sent me, all these have wrongly been removed from matching Godot. They have a pattern that we can properly match.

App 1041180 - Removed Engine.Godot
App 1269540 - Removed Engine.Godot
App 1389030 - Removed Engine.Godot
App 1412500 - Removed Engine.Godot
App 1429120 - Removed Engine.Godot
App 1507970 - Removed Engine.Godot
App 1539630 - Removed Engine.Godot
App 1561390 - Removed Engine.Godot
App 1660050 - Removed Engine.Godot
App 1684810 - Removed Engine.Godot
App 388490 - Removed Engine.Godot
App 498310 - Removed Engine.Godot
App 863330 - Removed Engine.Godot

What should it be detected as?

Godot

akien-mga commented 3 years ago

There's at least two different bugs in this list:

This one should pass current checks as is with its Windows depot:

protocorgi_demo_windows_x64.exe
protocorgi_demo_windows_x64.pck

The Linux depot doesn't pass due to non-standard extension (or lack thereof), but that shouldn't invalidate the Windows match:

protocorgi_demo_linux_x64
protocorgi_demo_linux_x64.pck

Windows depot should pass checks. macOS depot should pass them too once the logic is fixed (https://github.com/SteamDatabase/FileDetectionRuleSets/pull/44#issuecomment-895493150), but again, a lack of match on macOS should probably not invalidate the Windows match.

All three platforms should pass checks. It was likely removed due to invalid macOS check (see above).

This one is a bit tricky as it puts the PCK in a separate depot to share between the Windows and the Linux depot (since it's the same file for both platforms).

But macOS should let us find a match: https://steamdb.info/depot/1269544/

That one should match on Windows (#37 is back). Linux build uses Embed PCK so can't match. macOS should match once fixed. This and Paralyzed are examples of macOS builds where the .app filename doesn't match the executable and PCK (but the latter two must match each other).

akien-mga commented 3 years ago

Checked more of the OP's list, more wrongly removed Godot games.

Windows match, macOS wrongly unmatched. I tried the game earlier today and confirm it's Godot.

To be confirmed but this seems to be an example of a Godot game with multiple PCKs. One that matches the ruleset, one extra loaded at runtime as patch/DLC.

I'd still like to see a list of SteamDB matches for mygame.exe + mygame.pck + other PCK files to see how many of them are Godot games or not. But that can be for later improvement work.

Both Windows and macOS builds should match once macOS rules are fixed. Worth noting that they're in the same depot.

Windows and Linux are a match, macOS wrongly unmatched. Also 100% sure it's Godot, I played it.

Same story, Windows and Linux are a match. macOS should match too once fixed. Worth noting, they messed up their macOS .app setup and nested it in itself - but it should still work and match I think. https://steamdb.info/depot/1684813/

akien-mga commented 3 years ago

These have correctly been removed, I'll remove them from the OP:

App 232790 - Removed Engine.Godot
App 285250 - Removed Engine.Godot
App 364110 - Removed Engine.Godot
App 467490 - Removed Engine.Godot
App 727090 - Removed Engine.Godot
akien-mga commented 3 years ago

And the rest of the list:

Windows should match without issue. Linux more tricky as no extension. macOS seems to be a Godot 2.x build weirdly enough, as the PCK in the .app is named data.pck: https://steamdb.info/depot/388499/

That one's a bit trickier as it's a complex Godot project, which uses multiple PCKs. You can see that they have a "Base PCK" depot shared between Windows and Linux, and another depot with export templates and extra asset PCKs.

Here again the macOS .app can be the safest bet: https://steamdb.info/depot/498315/

It does have:

RPG in a Box.app/Contents/MacOS/rpginabox
RPG in a Box.app/Contents/Resources/rpginabox.pck

which is a dead Godot giveaway.

It also has additional PCKs loaded at runtime.

larsiusprime commented 3 years ago

One thing for further context: our checker lumps ALL the depots together for one app, so we just get a big pile of files for up to all three platforms and any other weird configurations that the developer does. So it doesn't matter how they split things up. This also makes it basically impossible to only "consider" one platform at a time because you don't necessarily know where the pck file came from unless you intuit it from the file path, it's just in the soup

akien-mga commented 3 years ago

One thing for further context: our checker lumps ALL the depots together for one app, so we just get a big pile of files for up to all three platforms and any other weird configurations that the developer does. So it doesn't matter how they split things up. This also makes it basically impossible to only "consider" one platform at a time because you don't necessarily know where the pck file came from unless you intuit it from the file path, it's just in the soup

Then I wonder why something like this doesn't pass:

protocorgi_demo_windows_x64.exe
protocorgi_demo_windows_x64.pck
protocorgi_demo_linux_x64
protocorgi_demo_linux_x64.pck
ProtoCorgi Demo.app
ProtoCorgi Demo.app/Contents
ProtoCorgi Demo.app/Contents/Info.plist
ProtoCorgi Demo.app/Contents/MacOS 
ProtoCorgi Demo.app/Contents/MacOS/ProtoCorgi Demo
ProtoCorgi Demo.app/Contents/PkgInfo
ProtoCorgi Demo.app/Contents/Resources
ProtoCorgi Demo.app/Contents/Resources/icon.icns
ProtoCorgi Demo.app/Contents/Resources/ProtoCorgi Demo.pck

I guess it's because the Linux executable doesn't match its .pck with your rules, so the PCK is seen as an extra PCK and thus everything is discarded.

Same situation for other entries where the Windows and Linux patterns should pass, but the wrong macOS check of the .app instead of the executable discards it by adding an extra unmatched PCK.

akien-mga commented 3 years ago

So potential TODOs to fix most of those false negatives:

larsiusprime commented 3 years ago

macOS exe/pck matching logic should use Contents/MacOS/ executable to match the Contents/Resources/.pck, not the .app's basename.

By default, are both of them usually just going to be in the base Content/MacOS/ directory or do I need to worry about subdirectories?

Linux exe/pck matching should likely be made to support mygame + mygame.pck (also common, especially prior to whenever we started adding a Linux extension in Godot 3.0).

Working on a fix for this.

The rule on excluding entries with multiple PCKs might need to be relaxed. IMO if there's a match for mygame.exe + mygame.pck and extra PCKs in the same folder as mygame.pck, then there's good chances we're still looking at a Godot game.

Hmmm, trick here is false positives. How many extra pcks do you tend to see? It might be useful to have a cutoff number if we go this route.

akien-mga commented 3 years ago

macOS exe/pck matching logic should use Contents/MacOS/ executable to match the Contents/Resources/.pck, not the .app's basename.

By default, are both of them usually just going to be in the base Content/MacOS/ directory or do I need to worry about subdirectories?

Not the same folders, in an .app executables go to Contents/MacOS/ while assets/resources go to Contents/Resources/. Godot follows that Apple convention.

But it will indeed always be:

<whatever>.app/Contents/MacOS/<name>
<whatever>.app/Contents/Resources/<name>.pck

or

<whatever>.app/Contents/MacOS/<name>
<whatever>.app/Contents/Resources/data.pck

as described here: https://github.com/SteamDatabase/FileDetectionRuleSets/pull/44#issuecomment-895490372

The rule on excluding entries with multiple PCKs might need to be relaxed. IMO if there's a match for mygame.exe + mygame.pck and extra PCKs in the same folder as mygame.pck, then there's good chances we're still looking at a Godot game.

Hmmm, trick here is false positives. How many extra pcks do you tend to see? It might be useful to have a cutoff number if we go this route.

I'm not sure we would get any false positives if we do match first the Godot-specific requirement to have mygame.exe + mygame.pck.

The false positives I saw removed by your changes were mostly old DOS games with dozens of FOO.PCK and BAR.PCK which should not have matched any Godot rule (most did not even include a data.pck). So it seemed to be matching any *.pck in a case insensitive manner as a last resort.

But as I said before, I'm happy to review a list of what entries this would match and confirm if they're all Godot or if there are false positives.

IMO as long as we don't apply this to pre-3.0 games (i.e. data.pck, which is very common in old DOS games), we should be fine.

larsiusprime commented 3 years ago

Okay if you're willing to help validate then I'm willing to try some of this.

So here's an example:

文字遊戲:第零章.exe
文字遊戲:第零章.pck
文字遊戲:第零章.app
文字遊戲:第零章.app/Contents
文字遊戲:第零章.app/Contents/_CodeSignature
文字遊戲:第零章.app/Contents/_CodeSignature/CodeResources
文字遊戲:第零章.app/Contents/Info.plist
文字遊戲:第零章.app/Contents/MacOS
文字遊戲:第零章.app/Contents/MacOS/Word Game
文字遊戲:第零章.app/Contents/PkgInfo
文字遊戲:第零章.app/Contents/Resources
文字遊戲:第零章.app/Contents/Resources/icon.icns
文字遊戲:第零章.app/Contents/Resources/Word Game.pck

If we add the rule for linux of extensionless executables, these pairs will be made:

文字遊戲:第零章.exe <--> 文字遊戲:第零章.pck 文字遊戲:第零章.app <--> 文字遊戲:第零章.pck Word Game <--> Word Game.pck

and every pck file will be matched with at least one executable. However, we could imagine a separate case like this:

文字遊戲:第零章.exe
文字遊戲:第零章.pck
文字遊戲:第零章.app
文字遊戲:第零章Mac.app/Contents
文字遊戲:第零章Mac.app/Contents/_CodeSignature
文字遊戲:第零章Mac.app/Contents/_CodeSignature/CodeResources
文字遊戲:第零章Mac.app/Contents/Info.plist
文字遊戲:第零章Mac.app/Contents/MacOS
文字遊戲:第零章Mac.app/Contents/MacOS/Word Game
文字遊戲:第零章Mac.app/Contents/PkgInfo
文字遊戲:第零章Mac.app/Contents/Resources
文字遊戲:第零章Mac.app/Contents/Resources/icon.icns
文字遊戲:第零章Mac.app/Contents/Resources/Word Game.pck

And we would have these matches: 文字遊戲:第零章.exe <--> 文字遊戲:第零章.pck Word Game <--> Word Game.pck

So, we could add the further requirement to enforce the .app/Contents/MacOS & .app/Contents/Resources restriction, but I'm not sure if it helps us much.

I think your idea is to make sure the match is this: 文字遊戲:第零章.exe <--> 文字遊戲:第零章.pck .app/Contents/MacOS/Word Game <--> .app/Contents/Resources/Word Game.pck

That would certainly work and cut out some false positives, however, we also have to consider the case of Linux -- we could imagine this, with the "文字遊戲:第零章" file representing a linux executable with no extension.

文字遊戲:第零章.exe
文字遊戲:第零章.pck
文字遊戲:第零章
文字遊戲:第零章.pck
文字遊戲:第零章.app
文字遊戲:第零章Mac.app/Contents
文字遊戲:第零章Mac.app/Contents/_CodeSignature
文字遊戲:第零章Mac.app/Contents/_CodeSignature/CodeResources
文字遊戲:第零章Mac.app/Contents/Info.plist
文字遊戲:第零章Mac.app/Contents/MacOS
文字遊戲:第零章Mac.app/Contents/MacOS/Word Game
文字遊戲:第零章Mac.app/Contents/PkgInfo
文字遊戲:第零章Mac.app/Contents/Resources
文字遊戲:第零章Mac.app/Contents/Resources/icon.icns
文字遊戲:第零章Mac.app/Contents/Resources/Word Game.pck

In this scenario, we would have these matches:

`文字遊戲:第零章.exe` <--> `文字遊戲:第零章.pck`
`文字遊戲:第零章` <--> `文字遊戲:第零章.pck`
`.app/Contents/MacOS/Word Game` <--> `.app/Contents/Resources/Word Game.pck`

Essentially, any specificity and protection against false positives we get by adding more complex logic that enforces internal Mac App bundle directory structure, is invalidated by the fact that we still have to match files that are potentially extensionless linux executables against pck files with the same name.

akien-mga commented 3 years ago

To clarify, 文字遊戲:第零章.app is a folder, it shouldn't be matched with anything. It could theoretically be a file such as a Linux executable, but I think this information might be in the depots manifest?

In your last example, the results seem correct, no? Edit: Actually the other two seem fine too.

Essentially, any specificity and protection against false positives we get by adding more complex logic that enforces internal Mac App bundle directory structure, is invalidated by the fact that we still have to match files that are potentially extensionless linux executables against pck files with the same name.

How do you mean that matching extensionless Linux executables invalidates the protections? It does look like it would match what we want here.

larsiusprime commented 3 years ago

To clarify, 文字遊戲:第零章.app is a folder, it shouldn't be matched with anything.

To be clear, our system doesn't distinguish between folders and files -- all it has is a filename. All it has to go off of is the filename itself. But we can just "know" that .app is a folder and not treat it as an executable extension.

How do you mean that matching extensionless Linux executables invalidates the protections? It does look like it would match what we want here.

Yes maybe I'm misunderstanding you. Extensionless files matching pck files seems like it matches what we want and should be good. But I had gotten the notion that you wanted matching to be STRICTER, ie, detect if we are within a mac app bundle, and if so, then enforce the mac app bundle directory structure, and if it's not there, refuse to match (thus presumably ruling out false positives). My point is, if we have extensionless file matching for the linux stuff, we can't discriminate which depot that gets run on, the test gets run on everything lumped together. So even if it was meant for linux, it will also serve to match the pck file in the app bundle with the extensionless executable in the app bundle.

akien-mga commented 3 years ago

Ah I see. I don't think we need to be too strict, no, just the extensionless matching might be sufficient and does simplify the macOS situation a lot (since the executable there is always extensionless).

My point about being strict is ensuring that we do see .app/Contents/MacOS/name and .app/Contents/Resources/name.pck as a pair, while previous logic seems to have focused on .app and would thus not matching this one and make it a false negative due to an unpaired PCK. But extensionless pairing instead of checking the .app name specifically solves this.

larsiusprime commented 3 years ago

Yes, the new change I'm considering is removing the .app check and adding an extensionless check.

larsiusprime commented 3 years ago

Okay I pushed the latest change Not sure what schedule xPaw runs the checker on, but feel free to check the list again in a few hours or tomorrow or something?

akien-mga commented 3 years ago

Will do! Thanks a lot for your work :)

xPaw commented 3 years ago

Here's that log again after running the update:

App 38420 - Added Engine.Godot
App 1041180 - Added Engine.Godot
App 1161580 - Added Engine.Godot
App 1269540 - Added Engine.Godot
App 1389030 - Added Engine.Godot
App 1412500 - Added Engine.Godot
App 1429120 - Added Engine.Godot
App 1507970 - Added Engine.Godot
App 1561390 - Added Engine.Godot
App 1660050 - Added Engine.Godot
App 1684810 - Added Engine.Godot
App 1698620 - Removed Engine.Godot
akien-mga commented 3 years ago

This one is wrong, it's a Godot game: https://steamdb.info/app/1698620/depots/

I'd wager the "extension-less" check chokes on version numbers in filenames... common pitfall which we had to handle in Godot itself too :) See https://github.com/godotengine/godot/blob/d56da233c4a3a0ed83ae74c6a5674bb0673650e1/core/config/project_settings.cpp#L393-L427

Spindle_Demo_v1.0.0.app/Contents/MacOS/Spindle_Demo_v1.0.0
Spindle_Demo_v1.0.0.app/Contents/Resources/Spindle_Demo_v1.0.0.pck

False positive, Fallout Tactics: https://steamdb.info/app/38420/depots/

It has:

core/game.pck |   | pck | 30.86 MiB
core/locale/game |   | Folder |  

So the extension-less logic matches game.pck with the game folder. That's the main drawback of this approach if we can't distinguish between files and folders. But SteamDB itself has this information, couldn't the detection code get access to it? Extension-less files are quite rare and can be reliably assumed to be Unix executables, but extension-less folders are everywhere :)

False positive, Hardspace: Shipbreaker: https://steamdb.info/app/1161580/depots/

Also a problem of matching a PCK with a folder:

Shipbreaker_Data/StreamingAssets/Audio/GeneratedSoundBanks/Windows/English(US)
Shipbreaker_Data/StreamingAssets/Audio/GeneratedSoundBanks/Windows/English(US)/English(US).pck

On the upside, if there's only two false positives from this broad rule, it's not too bad. But they're both pretty popular games so having them misclassified as Godot is not so good. Might be worth considering an exclusion list for Steam IDs that are wrongly matched but we know do not use a given engine? I doubt we'll manage to find perfect heuristics.

App 1041180 - Added Engine.Godot App 1269540 - Added Engine.Godot App 1389030 - Added Engine.Godot App 1412500 - Added Engine.Godot App 1429120 - Added Engine.Godot App 1507970 - Added Engine.Godot

xPaw commented 3 years ago

I'd wager the "extension-less" check chokes on version numbers in filenames... common pitfall which we had to handle in Godot itself too

Wack 🤣

But SteamDB itself has this information, couldn't the detection code get access to it?

I did think about just ignoring folders entirely for the purposes of this project, because we don't have any rules that need to match on folders (especially if they won't have a file inside).

xPaw commented 3 years ago

So I skipped over folders, and it removed both 1161580 and 38420, so that's good.

Bad news is that also removed a bunch of Construct and AdobeFlash matches, these rules need to be reviewed.

larsiusprime commented 3 years ago

Gimme some examples of version numbers in file names and I'll figure it out!

On Tue, Aug 10, 2021 at 2:44 AM Pavel Djundik @.***> wrote:

So I skipped over folders, and it removed both 1161580 and 38420, so that's good.

Bad news is that also removed a bunch of Construct and AdobeFlash matches, these rules need to be reviewed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SteamDatabase/FileDetectionRuleSets/issues/46#issuecomment-895806228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUN6G6TZA2QVROIS7VVILT4DKFRANCNFSM5B2WUDBA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- www.fortressofdoors.com -- My design & business of game development blog

larsiusprime commented 3 years ago

I'll revise the Construct and AdobeFlash matches; one thought that occurred to me is, Godot only looks at stuff in the 2pass logic.

You can get the Construct and AdobeFlash matches back still by only filtering out folders on the twopass check. Or I can just see if I can clean up those rules to not depend on folder matches.

akien-mga commented 3 years ago

Gimme some examples of version numbers in file names and I'll figure it out!

This one: https://steamdb.info/app/1698620/depots/

Spindle_Demo_1.0.0.pck
Spindle_Demo_1.0.0.x86_64
Spindle_Demo_v1.0.0.app/Contents/MacOS/Spindle_Demo_v1.0.0
Spindle_Demo_v1.0.0.app/Contents/Resources/Spindle_Demo_v1.0.0.pck
Spindle_Demo_1.0.0.exe
Spindle_Demo_1.0.0.pck

And maybe as an additional test, something like (not from a real depot):

Spindle_Demo_1.0.0_yay.pck
Spindle_Demo_1.0.0_yay
xPaw commented 3 years ago

I made it check all base files as .pck names, here's what it added:

https://steamdb.info/app/1003860/depots/ https://steamdb.info/app/1432760/depots/ https://steamdb.info/app/1457490/depots/ https://steamdb.info/app/1492610/depots/ https://steamdb.info/app/1559340/depots/ https://steamdb.info/app/1641020/depots/ https://steamdb.info/app/1646710/depots/ https://steamdb.info/app/1654620/depots/ https://steamdb.info/app/1667400/depots/ https://steamdb.info/app/1677520/depots/ https://steamdb.info/app/1678100/depots/ https://steamdb.info/app/1694380/depots/ https://steamdb.info/app/1698620/depots/ https://steamdb.info/app/1699810/depots/ https://steamdb.info/app/1700260/depots/ https://steamdb.info/app/1704130/depots/ https://steamdb.info/app/1706150/depots/ https://steamdb.info/app/1706350/depots/ https://steamdb.info/app/1716550/depots/ https://steamdb.info/app/1718270/depots/

Are you seeing anything missing now? (some of the additions are just apps my checker didn't see on my local steamdb)

akien-mga commented 3 years ago

I checked them all, they're all Godot games :+1:

Some recently released ones, and indeed a couple fixed ones that had a version number in their name.

xPaw commented 3 years ago

You can review the pck detection code here: https://github.com/SteamDatabase/FileDetectionRuleSets/blob/9290dbb2beee1928b3937564548d4004adb5ac26/tests/FileDetector.php#L258

If there's no more false positives, this issue can be closed.

akien-mga commented 3 years ago

The false positives are fixed, and almost all false negatives listed in this issue are fixed.

There's three left, but I'll open dedicated issues for them:

Edit: That's issue #52.

One Way To Die.app/Contents/MacOS/One Way To Die
One Way To Die.app/Contents/Resources/data.pck

(That's a Godot 2.1 game using the old data.pck naming.)

Edit: That's issue #51.

Ok there's a fourth one as bonus, Godot itself :D https://steamdb.info/app/404790/ It doesn't have a PCK but it's technically made with Godot and should be easy to detect, I'll make a separate issue for it too.

Edit: That's PR #53.