andriyDev / recastnavigation-rs-sys

Raw Rust FFI bindings for recastnavigation.
MIT License
7 stars 2 forks source link

Heap Corruption calling dtFreeTileCache #9

Closed Sunderous closed 12 months ago

Sunderous commented 12 months ago

Hey, sorry I took a break from my project or I would've caught this back when we were looking at the last issue.

Context is:

What seems to happen, is at the end of my test if I try to call dtFreeTileCache on the cache pointer/reference it corrupts the heap and the program exits in failure.

If I don't free the cache, it terminates fine and everything is green.

I've tried different orders of freeing the cache (right now I'm mirroring the order you do things in your tile cache test in the library).

And changed my cache so that it's initialized the exact same way yours is.

So I'm going to try poking around a bit more to see if anything else is significantly different compared to your test, but I'm kind of at a loss.

EDIT: Pulled down your 1.0.2 code, set it to use the same toolchain, and your tile cache tests pass fine. So it must have something to do with how I'm passing around the pointers/references between the 2 functions or something.

andriyDev commented 12 months ago

Hmmm I have no idea why this could be the case. I wonder if my test has heap corruption that's just not getting picked up?

Any chance you can provide a minimum reproducible example?

Sunderous commented 12 months ago

Had to sleep on it a bit, but just realized that makes a lot of sense. I'll copy your test into my code, see if that works. Then I'll copy my stuff into a fresh project, and see if that's broken. If I can reproduce in a fresh project I'll throw it up on github and link it here.

Thanks!

Sunderous commented 12 months ago

Boiled it down as much as I could:

Repro Repo

So I need to use fastlz compress/decompress to load the nav mesh I'm working with, so I made both tests use those functions. Yours still passes, mine still corrupts the heap.

Other than that, the only significant difference is that I'm loading my mesh/tile cache from a saved file that I produced using the RecastDetour example application.

But in my mind that shouldn't really matter in this case, unless I'm misunderstanding some rust file stuff that's having this affect because it's all unsafe so I wouldn't see any warnings ahead of time.

I quickly tried it using your set_poly_flags function as that was one of the only remaining differences, still no change.

EDIT: As best I can tell, I might just be doing unsafe rust poorly here haha: image

If I never add any tiles to the cache, it let's me free it no problem before failing the rest of the test. So my best hunch right now is it has something to do with building the cache from that slice and using .as_mut_ptr(), probably does something weird when the slice goes out of scope at the end of the loop, but we're still pointing to it's memory. Probably need to brush up on lifetimes or try something else.

Sunderous commented 12 months ago

Alright, sorry for turning this into a rubber duck, but "solved" it. Still not 100% clear on the underlying mechanisms, but I can sort of see why this would be necessary given what I described above about the data going out of scope, and the cache possibly pointing to it if we're not careful.

image

Still might not be the optimal way, but if I use dtAlloc() to allocate fresh memory for the data, then copy the data slice into it, that does the trick. Which sort of makes sense, you want to allocate space that won't get cleaned up until we free the cache later.

So I guess for anyone else trying to load existing meshes/caches, look out for that haha.

Thanks again.

andriyDev commented 12 months ago

Thank you for sharing this!! It's interesting to see real use cases and the pitfalls here.

I think the problem here is DT_COMPRESSEDTILE_FREE_DATA. FWIR, that flag tells the tile cache to take ownership of the data pointer you give it. Since the original slice was something you owned, that was a double free. So adding a dtAlloc works because you give it something that the tile cache does own. So that's one solution.

If you know that your data bytes will stay loaded while you use your tile cache, I think you can replace that flag with 0 and pass the original slice again, and everything will also be safe. That avoids a copy but means you have to keep the original data loaded.

I never would have thought of this without you already coming up with a solution 😅

Sunderous commented 12 months ago

Thanks for the explanation. My understanding of the detour internals is basically non-existant, other than how to use them haha. So if that's what the flag does, then that totally makes sense. You're adding a second owner to the memory rust doesn't know about, it frees the data when the function/loop scope ends, and then you're left with garbage you can't use/free again.

If I'd actually had to use the cache, and not just the mesh in my test, I would've probably noticed earlier since the cache was likely non-functional at that point and would've barfed if I asked it to do something.

I'll have to wait and see how I structure my navigation module to know which approach to use. If I'm going to keep the data loaded in the module scope, I can do it the simpler way. But if I'm just going to have a function that takes a file and produces a cache/mesh, then I'll stick with the extra allocation.