Open CosmicHorrorDev opened 3 years ago
Exactly two files (the same exact same contents) use some weird platform tag identifier thing like so
"Foo"
{
"Bar" [$WIN32]
{
}
}
Handling this would probably be a pain especially since I have no clue what possible values there are and I also don't know how all it can be applied (I'm assuming the above would make "Bar" and its value considered Windows 32-bit exclusive, can it also be applied to a value that is a string? Where else could it be used?
It seems common to still use \
as a path separator instead of escaping a character. I suppose the easiest way to handle this would be to have escape characters to not be parsed by default and add an option to parse them since they seem incredibly rare
It seems somewhat common to include a null byte at the end of the file. Not sure if this is packed file specific and just isn't handled right or if this is present normally (Hopefully it's just the former for consistency)
Some files failed to read because they're not UTF-8 encoded. Need to dig into the different encodings used. It may be reasonable to expect users to handle encoding and convert it to UTF-8 for us
It looks like the platform specific tags may be more common and do seem to indicate the platform that a value is used for. Here's a snippet from another file
"xpos" "r223" [$WIN32]
"xpos" "r223" [$X360HIDEF]
"xpos" "r220" [$X360LODEF]
This also shows that it can be used on values that are strings as well. The full set of tags that I've seen so far are WIN32
, WIN32WIDE
, X360
, X360HIDEF
, X360LODEF
, X360WIDE
, DEMO
, ENGLISH
, JAPANESE
, KOREAN
, etc. and beyond that there looks to be some conditional logic that can be used as well like [$WIN32 && $ENGLISH]
or [$WIN32 && !$ENGLISH]
The parsing position is a bit awkward as well since it can appear at the end of a pair for Key-String, but between the two tokens for Key-Obj. With how many different possible values there are it doesn't seem worth trying to parse specifics, we could just return the string for what's inside
Of the 16,353 failures this is included in 345
The number of files that used #base
are 292 of the 16,353 failures.
Of those files it appears that #base
always appears on the top value. I'll have to dig in more to see if #base
was ever used with a file that also has a #base
Finally the number of files that use \
when not trying to represent an escaped character are 15,079 which makes it a very prevalent issue.
From finding out how to extract contents from
.vpk
files in #26 we now have over 60k VDF files to test parsing with just from the contents of a few Valve gamesThe full corpus is much too large and probably a nono to include in here, but I'll hack together a program that tries to parse each file and dump any ones that fail to a separate location. Once I get that running I'll post any failures here