abesto / clox-rs

Following along the second (C) part of https://craftinginterpreters.com/, but with Rust, because reasons.
https://abesto.github.io/clox-rs/
16 stars 0 forks source link

Tried to expand your implementation and ran into some issues. Maybe you can help? #1

Open JanEricNitschke opened 1 year ago

JanEricNitschke commented 1 year ago

Hi,

really nice work on this repo. I used it (basically copied) to work through the book in rust. After i was done i tried to expand on the amazing work you did and managed to get some stuff done. https://github.com/JanEricNitschke/CraftingInterpreters/tree/main/rust_version

However i have now run into an issue where i feel out of my depth (segmentation fault) and thought maybe you had the expertise (and time) to help.

Specifically i ran into issue when trying to run some of the other benchmark programs in the book repo. Specifically this one:

https://github.com/JanEricNitschke/CraftingInterpreters/blob/main/rust_version/benchmark/binary_trees.lox

Starting at a given depth of the trees i am getting non deterministic seg faults (happen in ~70% of the runs)

This happens regardless of the gc setting, even if i turn it off completely.

I tried to track down the issue but didnt really get anywhere.

I found out that is happens based on this change although i have absolutely no idea how that could be. With the version before that commit i do not get the seg fault in ~50 run. After that change i get it ~70% of the time. Interestingly enough the crash does not actually happen in any of those new lines...

So yeah....

If you do not have the time or interest to have a look at this thats totally fine. I just thought i'd try in case you know or have an idea what might be causing the issue. I am also not sure where else to really ask for help, so i thought why not.

Cheers.

abesto commented 1 year ago

Hah, glad you found this useful :)

I skimmed the code and can't see anything that stands out at a glance. You'll want to look at the core dump to figure out where exactly the error is happening, and go from there. There are only a very few things here that I know can lead to a segfault - they are helpfully marked unsafe (yay for Rust), so that's where I'd start looking (the heap implementation and the Pin trickery). But that's a wild guess - look at the core dump, and go from there, is the best I can give you :D

At a quick glance, the approach in https://jvns.ca/blog/2017/12/23/segfault-debugging/ looks about right for the debugging.

JanEricNitschke commented 1 year ago

Wow, did not expect such a fast reply, thanks!

Yeah, the only places where things can go wrong is in the unsafe region, but they do spread pretty far here, right? Everything that interacts with the heap in principle.

I did manage to nail it down to it always happening on the class.is_native check, although i am still unsure what there causes it. Or rather the class part itself. If i try to print the class before the check it already crashes there. And curiously enough it is always either the 3042 call of that line or not at all.

Specifically accessing class causes the segfault after this line: let instance_id: ValueId = self.heap.add_value(Instance::new(callee).into()); https://github.com/JanEricNitschke/CraftingInterpreters/blob/main/rust_version/src/vm.rs#L1080

Will look into it a bit more and thanks for the link. Now i just need to figure out how to get a core dump and run valgrind on windows :D

JanEricNitschke commented 1 year ago

I have kind off given up now and just did a workaround for now.

I tried to narrow it down further and it seems to be related to the methods of the class.

If i grab the name, is_native and methods before the problematic line, then i can still print the name and the is_native afterward. Its the methods that crash. But i can also grab the whole content (the two defined methods and their keys) before and still print them after. So the issue seems to be with the hashmap itself???

What i am doing now is that i just grab the is_native off the class before the problematic line and use it afterwards. https://github.com/JanEricNitschke/CraftingInterpreters/commit/9149576f4256a071381d29ee4a275d72b82d3c3b

              let maybe_initializer = class
                    .methods
                    .get(&self.heap.builtin_constants().init_string);
                println!("Before add value");
                let class_methods = &class.methods;
                println!("Class methods: {:?}", class_methods);
                let check_string = if let Entry::Occupied(entry) =
                    self.heap.strings_by_name_mut().entry("check".to_string())
                {
                    *entry.get()
                } else {
                    unreachable!("Should find it")
                };
                let other_method = class.methods.get(&check_string).unwrap();
                let init_method = class
                    .methods
                    .get(&self.heap.builtin_constants().init_string)
                    .unwrap();
                println!("check method: {:?}", other_method);
                println!("check_string: {:?}", check_string);
                println!("init method: {:?}", init_method);
                println!(
                    "init_string: {:?}",
                    self.heap.builtin_constants().init_string
                );
                let instance_id: ValueId = self.heap.add_value(Instance::new(callee).into()); // After this i can not print class anymore
                println!("After add value");
                println!("check method: {:?}", other_method);
                println!("check_string: {:?}", check_string);
                println!("init method: {:?}", init_method);
                println!(
                    "init_string: {:?}",
                    self.heap.builtin_constants().init_string
                );
                println!("Class methods orig: {:?}", class_methods);

// Call number 3041 of the whole method
Before add value
Class methods: {ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }, ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }}
check method: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }
check_string: ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }
init method: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }
init_string: ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }
After add value
check method: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }
check_string: ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }
init method: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }
init_string: ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }
Class methods orig: {ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }, ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }}
Class methods: {ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }, ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }}
// Call number 3042
Before add value
Class methods: {ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }, ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }}
check method: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }
check_string: ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }
init method: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }
init_string: ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }
After add value
check method: ArenaId { id: ValueKey(2113v1), arena: 0x15eb1ca64f0 }
check_string: ArenaId { id: StringKey(26v1), arena: 0x15eb1ca6498 }
init method: ArenaId { id: ValueKey(2112v1), arena: 0x15eb1ca64f0 }
init_string: ArenaId { id: StringKey(1v1), arena: 0x15eb1ca6498 }
// Here it shoud say "Class methods orig:..."

I also have another unrelated question if you do not mind:

i dont yet fully understand the separation of the strings and functions from the other values, focused on the strings at the moment in conjunction with the interning.

From what i get is that we have our lox runtime strings as Value::String which wrap a StringId which points into the heap to a rust string. For the interning at the moment you have the string_ids function in the compiler that deduplicates the StringIds.

But then there are still multiple Value::String's for the same StringId. Additionally i think not every place used the string_ids function. For example the Add opcode which caused multiple StringIds to exist for the same rust string. I tried to harmonize that by moving the strings_by_name field and string_ids function to the heap and also calling it in the native_functions and the add opcode. but i have now started getting issues with the StringIds getting swept away prematurely when grabbing them for native functions or the add opcode.

abesto commented 1 year ago

So the issue seems to be with the hashmap itself???

I'm fairly confident HashMap is fine, and something is wrong in either the original clox-rs code that you're now triggering, or something's up in your code :D If you want to the bottom of this, I'd say: try to build a minimal repro, and post on your favorite Rust forum. Maybe drop a link here for continuity / my interest.

But then there are still multiple Value::Strings for the same StringId.

That's perfectly fine! This is kind of the clox-rs way of maintaining multiple references to the same value. (I think. It's been a while.)

Additionally i think not every place used the string_ids function.

Yep, this is a performance trade-off if I remember correctly. Your choices are:

  1. Always look up if you're already storing a string when creating a new one. This minimizes memory cost, but your performance is gonna suck if user code is creating lots of strings in a hot loop.
  2. Only look up string IDs by value in some carefully chosen places, notably, not when user code generates new strings. This means we'll have suboptimal memory usage (i.e. we'll store some strings in multiple StringIds), but in exchange creating strings is cheap.

I vaguely remember this trade-off being discussed in the book too, maybe not in the case of strings? But I can't find it at a quick search.

abesto commented 1 year ago

but i have now started getting issues with the StringIds getting swept away prematurely when grabbing them for native functions or the add opcode.

That's a GC bug then - IIRC I used native functions as GC roots, and the mark phase ought to mark the strings containing the function names as used, but maybe I missed something that your code is triggering, or maybe you missed something in new code (like adding something as a GC root that should be a GC root, or not recursing into something in the mark phase).

abesto commented 1 year ago

... actually, here's one thing that might be happening with the HashMap / StringId segfaults: StringIds store raw memory locations, right? Now, HashMaps are allowed (and expected) to reorganize their stored values all the time. This would invalidate the memory location stored in a StringId.

Possibly this was fine in the original implementation because the HashMap is frozen and small, it's only ever populated in one go during parsing. Changing the hashmap all the time at runtime is likely to trigger this bug (and it being non-deterministic isn't a surprise). On test you could run is: using the original code, define a ton of functions / string constants. Like, hundreds or thousands. Then see if accessing them explodes.

Bottom line: the whole "store memory locations in *Ids" is wonky AF, and shouldn't really exist, honestly, but it's the best I could come up with at the time :)