ValveResourceFormat / ValveKeyValue

📃 Next-generation Valve's key value framework for .NET
https://www.nuget.org/packages/ValveKeyValue/
MIT License
146 stars 37 forks source link

Failing to parse csgo_english #103

Closed JeremyEspresso closed 3 days ago

JeremyEspresso commented 3 days ago

Using the library and attempting to parse the "csgo_english" file (https://raw.githubusercontent.com/SteamDatabase/GameTracking-CS2/master/game/csgo/pak01_dir/resource/csgo_english.txt)

It fails to parse:

      ValveKeyValue.KeyValueException: Unrecognized term after '#' symbol (line 2896, column 33)
       ---> System.IO.InvalidDataException: Unrecognized term after '#' symbol (line 2896, column 33)
         at ValveKeyValue.Deserialization.KeyValues1.KV1TokenReader.ReadInclusion() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TokenReader.cs:line 127
         at ValveKeyValue.Deserialization.KeyValues1.KV1TokenReader.ReadNextToken() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TokenReader.cs:line 45
         at ValveKeyValue.Deserialization.KeyValues1.KV1TextReader.ReadObject() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TextReader.cs:line 40
         --- End of inner exception stack trace ---
         at ValveKeyValue.Deserialization.KeyValues1.KV1TextReader.ReadObject() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TextReader.cs:line 44
         at ValveKeyValue.KVSerializer.Deserialize(Stream stream, KVSerializerOptions options) in /_/ValveKeyValue/ValveKeyValue/KVSerializer.cs:line 41

https://github.com/ValveResourceFormat/ValveKeyValue/blob/6b6a86822382281f42fc345b5344357182bd2dd1/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TokenReader.cs#L133

In csgo_english the lines from 2895 until 2901 are the following:

2895:       "leaderboard_region_abbr_Europe"        "<font color=\"#FFDD00\">EU</font>"
2896:       "leaderboard_region_abbr_Asia"          "<font color=\"#fc8200\">AS</font>"
2897:       "leaderboard_region_abbr_Australia"     "<font color=\"#008bfc\">AU</font>"
2898:       "leaderboard_region_abbr_Africa"        "<font color=\"#19bf00\">AF</font>"
2899:       "leaderboard_region_abbr_NorthAmerica"      "<font color=\"#d281fc\">NA</font>"
2900:       "leaderboard_region_abbr_SouthAmerica"      "<font color=\"#02c1e3\">SA</font>"
2901:       "leaderboard_region_abbr_China"         "<font color=\"#ff5959\">CN</font>"

It seems like it's attempting parse these as an "inclusion". Which in this case seems incorrect as they are color values. Happy to provide more info if needed.

Thanks!

xPaw commented 3 days ago

One line repro? is it caused by escaping \" or just having # in general.

JeremyEspresso commented 3 days ago

Here's a repro you can throw into an empty project including downloading the file I am parsing.

FWIW it works parsing https://raw.githubusercontent.com/SteamDatabase/GameTracking-CS2/master/game/csgo/pak01_dir/scripts/items/items_game.txt

var resp = await _httpClient.GetAsync("https://raw.githubusercontent.com/SteamDatabase/GameTracking-CS2/master/game/csgo/pak01_dir/resource/csgo_english.txt");

// Assume success for repro

var vdfStream = await resp.Content.ReadAsStreamAsync();
var serializer = KVSerializer.Create(KVSerializationFormat.KeyValues1Text);
var csGoEnglish = serializer.Deserialize(vdfStream); 
xPaw commented 3 days ago

A repro for the kv text, not the code :P

JeremyEspresso commented 3 days ago

ah sorry, misunderstood.

"lang"
{
"Tokens"
{
"leaderboard_region_abbr_Asia"          "<font color=\"#fc8200\">AS</font>"
}
}
xPaw commented 3 days ago
"test#fc8200"

"t\"est#fc8200"

"test\"#fc8200"

do these also fail?

JeremyEspresso commented 3 days ago
"test#fc8200" // Seems to work

"t\"est#fc8200" // Fails

"test\"#fc8200" // Fails
JeremyEspresso commented 3 days ago

It also seems to fail on this one:

        "CSGO_crate_spray_std2_1"           "CS:GO Graffiti #2 Collection"
JeremyEspresso commented 3 days ago
        "HudSpecPlayer_WeaponName"              "<span class=\"possessive-player-name\">{s:possessive_player_name}</span><br><span class=\"weapon-name\">{s:weapon_name}</span><br><font color=\"{s:rarity_color}\"><span class=\"weapon-kit-name\">{s:weapon_kit_name}</span><br><span class=\"weapon-name-custom\">\"{s:weapon_name_custom}\"</span></font>"

Another error:

 System.InvalidOperationException: Attempted to begin new object while in state InObjectAfterValue at line 33184, column 178.
         at ValveKeyValue.Deserialization.KeyValues1.KV1TextReader.BeginNewObject() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TextReader.cs:line 155
         at ValveKeyValue.Deserialization.KeyValues1.KV1TextReader.ReadObject() in /_/ValveKeyValue/ValveKeyValue/Deserialization/KeyValues1/KV1TextReader.cs:line 58
         at ValveKeyValue.KVSerializer.Deserialize(Stream stream, KVSerializerOptions options) in /_/ValveKeyValue/ValveKeyValue/KVSerializer.cs:line 41
xPaw commented 3 days ago

Ok there is no bug here, you need to enable escape sequences:

var options = new KVSerializerOptions
{
    HasEscapeSequences = true,
};

for csgo_english specifically:

var options = new KVSerializerOptions
{
    HasEscapeSequences = true,
    EnableValveNullByteBugBehavior = true
};

@yaakov-h should we detect when escape sequences are off, we detect one and suggest enabling them if something ends up throwing?

JeremyEspresso commented 3 days ago

Yup, working correctly with those settings enabled. Thank you.

yaakov-h commented 3 days ago

@xPaw not a bad idea