CosmicHorrorDev / vdf-rs

VDF parsing and (de)serialization for Rust
Apache License 2.0
30 stars 3 forks source link

Stack overflow parsing string with many escaped chars #54

Closed bananapizzuh closed 5 days ago

bananapizzuh commented 3 weeks ago

I am creating an application that needs to find all games owned by a steam account. I have looked around, and concluded that <steamdir>/userdata/<user_id>/config/localconfig.vdf is the best way to do that. I converted the file to json online (https://rgp.io/vdf-parser/), and had no issue parsing it using serde_json. When running with rust_backtrace enabled, there is not a backtrace with the error. My platform is windows, using msvc.

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\bin.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)
use std::collections::HashMap;
use std::fs::File;
use std::io::BufReader;
use std::path::Path;

use serde::Deserialize;
use serde::Serialize;

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Root {
    #[serde(rename = "UserLocalConfigStore")]
    pub user_local_config_store: UserLocalConfigStore,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct UserLocalConfigStore {
    #[serde(rename = "Software")]
    pub software: Software,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Software {
    #[serde(rename = "Valve")]
    pub valve: Valve,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Valve {
    #[serde(rename = "Steam")]
    pub steam: Steam,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Steam {
    pub apps: HashMap<String, App>,
    #[serde(rename = "LastPlayedTimesSyncTime")]
    pub last_played_times_sync_time: String,
    #[serde(rename = "PlayerLevel")]
    pub player_level: String,
}

#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct App {
    #[serde(rename = "LastPlayed")]
    pub last_played: Option<String>,
    #[serde(rename = "BadgeData")]
    pub badge_data: Option<String>,
    pub playtime: Option<String>,
}

pub fn list_games() {
    let path = Path::new("localconfig.vdf");
    let file = File::open(&path).unwrap();
    let reader = BufReader::new(file);
    let v: Root = keyvalues_serde::from_reader(reader).unwrap();
}
CosmicHorrorDev commented 3 weeks ago

Are you sure the stackoverflow is coming from keyvalues-serde? I was able to run your example program as expected on my local config with minor tweak to deserialize as UserLocalConfigStore instead of Root since the config store is the outermost "object". Here's the diff of the changes

57c57
< pub fn list_games() {
---
> pub fn main() {
61c61,62
<     let v: Root = keyvalues_serde::from_reader(reader).unwrap();
---
>     let v: UserLocalConfigStore = keyvalues_serde::from_reader(reader).unwrap();
>     dbg!(v);

Which ran as expected although there could be something unique about your config

❯ cargo r
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/repro`
[src/main.rs:62:5] v = UserLocalConfigStore {
    software: Software {
        valve: Valve {
            steam: Steam {
                apps: {
                    "1274570": App {
                        last_played: Some(
                            "1653872005",
                        ),
                        badge_data: None,
                        playtime: None,
                    },
                    "1576000": App {
                        last_played: Some(
                            "1717473346",
                        ),
                        badge_data: Some(
                            "02000000080a",
                        ),
                        playtime: None,
                    },
                    "677120": App {
                        last_played: Some(
                            "1653523035",
                        ),
                        badge_data: None,
(lots of elided output)

I'm not very familiar on debugging specifics for windows, but if you run your program while a debugger is attached then you can look at the callstack when it hits the stackoverflow. The usual culprit is unbounded recursion which looks like the same cycle of functions getting called in the callstack

bananapizzuh commented 3 weeks ago

I stepped through it using a debugger in vscode, and it ends up at the find_raw_avx2 function inside of the memchr crate, which I think the pest crate uses. It comes from line 180 in text/parse.rs in the keyvalues-parse side of this project.

I just tested it inside wsl, and it worked perfectly. It must be something with windows.

CosmicHorrorDev commented 3 weeks ago

It working for you in WSL, but not windows rules out that it's something specific in your config causing issues at least. It could be either platform dependent code, or the different default thread sizes for the main thread as I believe windows uses 1 MiB while Linux uses 2 MiB

You could try running with extra stack space to see if that's enough for it to succeed. It looks like the default for new threads is 2 MiB, so something like

fn larger_stack() {
    std::thread::spawn(list_games).join().unwrap()
}

and if that alone doesn't work you can try setting a higher value with the RUST_MIN_STACK env var (or with std::thread::Builder::stack_size)

I can try reproing on my windows install sometime this weekend as well

bananapizzuh commented 2 weeks ago

Running it in the main thread with 64 mb as the RUST_MIN_STACK env var still causes a stack overflow. When running the function as a thread, I have to set the stack size to 8 mb either using std::thread::Builder::stack_size or the env var for it to work. If it helps, my localconfig.vdf file is 268 kb.

CosmicHorrorDev commented 2 weeks ago

Running it in the main thread with 64 mb as the RUST_MIN_STACK env var still causes a stack overflow

Sorry I wasn't super clear before, but that's expected since it doesn't have any impact on the main thread

From: https://doc.rust-lang.org/std/thread/index.html#stack-size

Note that the stack size of the main thread is not determined by Rust.

I'll try getting a more minimal repro on windows to see if we can track things down further. Also incidentally I'm planning on switching the parser from using pest to winnow at some point in the future which could help with this

CosmicHorrorDev commented 2 weeks ago

Good(ish) news! I'm able to repro on my windows install, and I can repro by building with cross and running through wine on linux :partying_face:

$ cross build --target x86_64-pc-windows-gnu
...
$ wine target/x86_64-pc-windows-gnu/debug/repro.exe
0120:err:virtual:virtual_setup_exception stack overflow 4224 bytes addr 0x6ffffff964c3 stack 0x100ff80 (0x1010000-0x1011000-0x1210000)

I also have a tiny localconfig.vdf that doesn't hit a stackoverflow and a pretty small (~110 KiB) one that does, so next up will be minimizing that down and then we'll go from there

CosmicHorrorDev commented 2 weeks ago

Minimized things a good bit

[package]
name = "repro"
version = "0.1.0"
edition = "2021"

[dependencies]
keyvalues-serde = "0.2.1"
serde = { version = "1.0.209", features = ["derive"] }
use serde::Deserialize;

static CONFIG: &str = r###"
"UserLocalConfigStore"
{
    "WebStorage"
    {
        "LocalizedTagNames_english"     "{\"tags\":[{\"tagid\":8666,\"name\":\"Runner\"},{\"tagid\":8945,\"name\":\"Resource Management\"},{\"tagid\":9130,\"name\":\"Hentai\"},{\"tagid\":9157,\"name\":\"Underwater\"},{\"tagid\":9204,\"name\":\"Immersive Sim\"},{\"tagid\":9271,\"name\":\"Trading Card Game\"},{\"tagid\":9541,\"name\":\"Demons\"},{\"tagid\":9551,\"name\":\"Dating Sim\"},{\"tagid\":9564,\"name\":\"Hunting\"},{\"tagid\":9592,\"name\":\"Dynamic Narration\"},{\"tagid\":9803,\"name\":\"Snow\"},{\"tagid\":9994,\"name\":\"Experience\"},{\"tagid\":10235,\"name\":\"Life Sim\"},{\"tagid\":10383,\"name\":\"Transportation\"},{\"tagid\":10397,\"name\":\"Memes\"},{\"tagid\":10437,\"name\":\"Trivia\"},{\"tagid\":10679,\"name\":\"Time Travel\"},{\"tagid\":10695,\"name\":\"Party-Based RPG\"},{\"tagid\":10808,\"name\":\"Supernatural\"},{\"tagid\":10816,\"name\":\"Split Screen\"},{\"tagid\":11014,\"name\":\"Interactive Fiction\"},{\"tagid\":11095,\"name\":\"Boss Rush\"},{\"tagid\":11104,\"name\":\"Vehicular Combat\"},{\"tagid\":11123,\"name\":\"Mouse only\"},{\"tagid\":11333,\"name\":\"Villain Protagonist\"},{\"tagid\":11634,\"name\":\"Vikings\"},{\"tagid\":12057,\"name\":\"Tutorial\"},{\"tagid\":12095,\"name\":\"Sexual Content\"},{\"tagid\":12190,\"name\":\"Boxing\"},{\"tagid\":12286,\"name\":\"Warhammer 40K\"},{\"tagid\":12472,\"name\":\"Management\"},{\"tagid\":13070,\"name\":\"Solitaire\"},{\"tagid\":13190,\"name\":\"America\"},{\"tagid\":13276,\"name\":\"Tanks\"},{\"tagid\":13382,\"name\":\"Archery\"},{\"tagid\":13577,\"name\":\"Sailing\"},{\"tagid\":13782,\"name\":\"Experimental\"},{\"tagid\":13906,\"name\":\"Game Development\"},{\"tagid\":14139,\"name\":\"Turn-Based Tactics\"},{\"tagid\":14153,\"name\":\"Dungeons & Dragons\"},{\"tagid\":14720,\"name\":\"Nostalgia\"},{\"tagid\":14906,\"name\":\"Intentionally Awkward Controls\"},{\"tagid\":15045,\"name\":\"Flight\"},{\"tagid\":15172,\"name\":\"Conversation\"},{\"tagid\":15277,\"name\":\"Philosophical\"},{\"tagid\":15339,\"name\":\"Documentary\"},{\"tagid\":15564,\"name\":\"Fishing\"},{\"tagid\":15868,\"name\":\"Motocross\"},{\"tagid\":15954,\"name\":\"Silent Protagonist\"},{\"tagid\":16094,\"name\":\"Mythology\"},{\"tagid\":16250,\"name\":\"Gambling\"},{\"tagid\":16598,\"name\":\"Space Sim\"},{\"tagid\":16689,\"name\":\"Time Management\"},{\"tagid\":17015,\"name\":\"Werewolves\"},{\"tagid\":17305,\"name\":\"Strategy RPG\"},{\"tagid\":17337,\"name\":\"Lemmings\"},{\"tagid\":17389,\"name\":\"Tabletop\"},{\"tagid\":17770,\"name\":\"Asynchronous Multiplayer\"},{\"tagid\":17894,\"name\":\"Cats\"},{\"tagid\":17927,\"name\":\"Pool\"},{\"tagid\":18594,\"name\":\"FMV\"},{\"tagid\":19568,\"name\":\"Cycling\"},{\"tagid\":19780,\"name\":\"Submarine\"},{\"tagid\":19995,\"name\":\"Dark Comedy\"},{\"tagid\":21006,\"name\":\"Underground\"},{\"tagid\":21725,\"name\":\"Tactical RPG\"},{\"tagid\":21978,\"name\":\"VR\"},{\"tagid\":22602,\"name\":\"Agriculture\"},{\"tagid\":22955,\"name\":\"Mini Golf\"},{\"tagid\":24003,\"name\":\"Word Game\"},{\"tagid\":24904,\"name\":\"NSFW\"},{\"tagid\":25085,\"name\":\"Touch-Friendly\"},{\"tagid\":26921,\"name\":\"Political Sim\"},{\"tagid\":27758,\"name\":\"Voice Control\"},{\"tagid\":28444,\"name\":\"Snowboarding\"},{\"tagid\":29363,\"name\":\"3D Vision\"},{\"tagid\":29482,\"name\":\"Souls-like\"},{\"tagid\":29855,\"name\":\"Ambient\"},{\"tagid\":30358,\"name\":\"Nature\"},{\"tagid\":30927,\"name\":\"Fox\"},{\"tagid\":31275,\"name\":\"Text-Based\"},{\"tagid\":31579,\"name\":\"Otome\"},{\"tagid\":32322,\"name\":\"Deckbuilding\"},{\"tagid\":33572,\"name\":\"Mahjong\"},{\"tagid\":35079,\"name\":\"Job Simulator\"},{\"tagid\":37799,\"name\":\"Combat Flight Simulator\"},{\"tagid\":40500,\"name\":\"Sexual Themes\"},{\"tagid\":42089,\"name\":\"Jump Scare\"},{\"tagid\":42329,\"name\":\"Coding\"},{\"tagid\":42804,\"name\":\"Action Roguelike\"},{\"tagid\":44868,\"name\":\"LGBTQ+\"},{\"tagid\":47827,\"name\":\"Wrestling\"},{\"tagid\":49213,\"name\":\"Rugby\"},{\"tagid\":51306,\"name\":\"Foreign\"},{\"tagid\":56690,\"name\":\"On-Rails Shooter\"},{\"tagid\":61357,\"name\":\"Electronic Music\"},{\"tagid\":71389,\"name\":\"Spelling\"},{\"tagid\":87918,\"name\":\"Farming Sim\"},{\"tagid\":91114,\"name\":\"Shop Keeper\"},{\"tagid\":92092,\"name\":\"Jet\"},{\"tagid\":96359,\"name\":\"Skating\"},{\"tagid\":97376,\"name\":\"Cozy\"},{\"tagid\":102530,\"name\":\"Elf\"},{\"tagid\":117648,\"name\":\"8-bit Music\"},{\"tagid\":123332,\"name\":\"Bikes\"},{\"tagid\":129761,\"name\":\"ATV\"},{\"tagid\":143739,\"name\":\"Electronic\"},{\"tagid\":150626,\"name\":\"Gaming\"},{\"tagid\":158638,\"name\":\"Cricket\"},{\"tagid\":176733,\"name\":\"Tile-Matching\"},{\"tagid\":176981,\"name\":\"Battle Royale\"},{\"tagid\":180368,\"name\":\"Faith\"},{\"tagid\":189941,\"name\":\"Instrumental Music\"},{\"tagid\":198631,\"name\":\"Mystery Dungeon\"},{\"tagid\":198913,\"name\":\"Motorbike\"},{\"tagid\":220585,\"name\":\"Colony Sim\"},{\"tagid\":233824,\"name\":\"Feature Film\"},{\"tagid\":252854,\"name\":\"BMX\"},{\"tagid\":255534,\"name\":\"Automation\"},{\"tagid\":323922,\"name\":\"Musou\"},{\"tagid\":324176,\"name\":\"Hockey\"},{\"tagid\":337964,\"name\":\"Rock Music\"},{\"tagid\":348922,\"name\":\"Steam Machine\"},{\"tagid\":353880,\"name\":\"Looter Shooter\"},{\"tagid\":363767,\"name\":\"Snooker\"},{\"tagid\":379975,\"name\":\"Clicker\"},{\"tagid\":454187,\"name\":\"Traditional Roguelike\"},{\"tagid\":552282,\"name\":\"Wholesome\"},{\"tagid\":603297,\"name\":\"Hardware\"},{\"tagid\":615955,\"name\":\"Idler\"},{\"tagid\":620519,\"name\":\"Hero Shooter\"},{\"tagid\":745697,\"name\":\"Social Deduction\"},{\"tagid\":769306,\"name\":\"Escape Room\"},{\"tagid\":776177,\"name\":\"360 Video\"},{\"tagid\":791774,\"name\":\"Card Battler\"},{\"tagid\":847164,\"name\":\"Volleyball\"},{\"tagid\":856791,\"name\":\"Asymmetric VR\"},{\"tagid\":916648,\"name\":\"Creature Collector\"},{\"tagid\":922563,\"name\":\"Roguevania\"},{\"tagid\":1023537,\"name\":\"Boomer Shooter\"},{\"tagid\":1084988,\"name\":\"Auto Battler\"},{\"tagid\":1091588,\"name\":\"Roguelike Deckbuilder\"},{\"tagid\":1100686,\"name\":\"Outbreak Sim\"},{\"tagid\":1100687,\"name\":\"Automobile Sim\"},{\"tagid\":1100688,\"name\":\"Medical Sim\"},{\"tagid\":1100689,\"name\":\"Open World Survival Craft\"},{\"tagid\":1199779,\"name\":\"Extraction Shooter\"},{\"tagid\":1220528,\"name\":\"Hobby Sim\"},{\"tagid\":1254546,\"name\":\"Football (Soccer)\"},{\"tagid\":1254552,\"name\":\"Football (American)\"}]}"
    }
}
"###;

#[derive(Debug, Deserialize)]
pub struct UserLocalConfigStore {}

fn main() {
    keyvalues_serde::from_str::<UserLocalConfigStore>(CONFIG).unwrap();
}
$ cross build --target x86_64-pc-windows-gnu
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
$ wine target/x86_64-pc-windows-gnu/debug/repro.exe
0120:err:virtual:virtual_setup_exception stack overflow 3456 bytes addr 0x6ffffff9650d stack 0xf70280 (0xf70000-0xf71000-0x1170000)
CosmicHorrorDev commented 2 weeks ago

Minimized some more

[package]
name = "repro"
version = "0.1.0"
edition = "2021"

[dependencies]
keyvalues-parser = "0.2.0"
static CONFIG: &str = r###"
"" "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""
"###;

fn main() {
    keyvalues_parser::Vdf::parse(CONFIG).unwrap();
}

Now onto minimizing things down to a more minimal pest grammar and then I'll report upstream to pest and see where things on


Also worth noting that this still seems to be strictly windows specific since trying the same repro on linux even with adding in a thousand more escaped chars still runs fine

CosmicHorrorDev commented 2 weeks ago

So digging into things more it looks like it's not windows specific, but just requires a lot more escaped sequences to hit the stack overflow on linux. With the following

use std::{
    iter::{once, repeat},
    thread,
};

fn main() {
    // Run in another thread so that we can change the stack size with `RUST_MIN_STACK`
    thread::spawn(overflows_stack).join().unwrap();
}

fn overflows_stack() {
    for i in 1..1_000 {
        let num_escaped = i * 100;
        println!("{num_escaped}");
        let boom = lots_of_escaped(num_escaped);
        keyvalues_parser::Vdf::parse(&boom).unwrap();
    }
}

// Generates strings e.g. `lots_of_escaped(2)` gives `"" "\"\""`
fn lots_of_escaped(num_escaped: usize) -> String {
    once("\"\" \"")
        .chain(repeat("\\\"").take(num_escaped))
        .chain(once("\""))
        .collect()
}

We can hit an overflow rather quickly still if we limit the stack size e.g. with 500 KB

❯ RUST_MIN_STACK=500000 cargo r
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/repro`
100
200
300

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
[1]    18844 IOT instruction (core dumped)  RUST_MIN_STACK=500000 cargo r

The Problem?

The backtrace from when it overflows helps provide more context. In short the use of the alternations to express the different potential escaped chars is... problematic. The eureka moment for me was from seeing this type (manually cleaned up)

use keyvalues_parser::text::parse::escaped::Rule;
use pest::parser_state::ParserState;

Result<
    Box<ParserState<Rule>>,
    Box<ParserState<Rule>>>::and_then<
        Box<ParserState<Rule>>,
        Box<ParserState<Rule>>,
        Box<ParserState<Rule>>,
        // ... some complex closure type
    >
>

This type relates to the following snippet of the pest grammar which represents escaped characters escape = @{ "\\" ~ ("\"" | "\\" | "n" | "r" | "t") } where the 5 Rules above represent the alternation between " | \ | n | r | t. What actually happens under the hood is that the type is used to keep track of which alternation succeeds (Ok(_) meaning the current arm succeeded while Err(_) represents all the remaining arms).

This handles ambiguity where the parser will descend into each arm of the alternation until one succeeds. If the arm fails later down the line then it will bail back to the alternation and will descend down the next arm that succeeds meaning that each escaped character causes another nested function call that can eventually exhaust the entire stack

It's also worth noting that we don't actually gain anything from this method as we don't have any ambiguity when parsing quoted strings, so this winds up being entirely unnecessary for us

The fix?

I mean the actual grammar we have is really simple

WHITESPACE = _{ " " | "\t" | "\r" | "\n" }
COMMENT = _{"//" ~ (!"\n" ~ ANY)* }

vdf = _{ SOI ~ base_macro* ~ pair ~ "\u{00}"? ~ EOI }

base_macro = { "#base" ~ (quoted_raw_string | unquoted_string ) }

quoted_raw_string = ${ "\"" ~ quoted_raw_inner ~ "\"" }
quoted_raw_inner = @{ (!"\"" ~ ANY)* }

pairs = _{ pair* }

pair = { key ~ value }

key = _{ quoted_string | unquoted_string }

value = _{ quoted_string | obj | unquoted_string }

obj = { "{" ~ pairs ~ "}" }

quoted_string = ${ "\"" ~ quoted_inner ~ "\"" }
quoted_inner = @{ (!("\"" | "\\") ~ ANY)* ~ (escape ~ quoted_inner)? }
escape = @{ "\\" ~ ("\"" | "\\" | "n" | "r" | "t") }

unquoted_string = @{ unquoted_char+ }
// The wiki page just states that an unquoted string ends with ", {, }, or any
// whitespace which I feel is likely missing several cases, but for now I will
// follow that information
unquoted_char = {
    !("\"" | "{" | "}" | WHITESPACE)  ~ ANY
}

I'm just going to switch to a handwritten parser for keyvalues-parser instead of relying on a library. It'll strip out most of our dependencies without really adding much complexity compared to the existing impl (and we can fuzz the rewrite against the old version to help sniff out any discrepancies)

CosmicHorrorDev commented 2 weeks ago

@bananapizzuh I went ahead and opened #55, so that you have a PR you can subscribe to to keep track of when this is resolved. As a workaround for now I would recommend deserializing in another thread that has a larger stack to avoid the overflow