Closed michaeleisel closed 1 month ago
testing looks good. i will say, profiling in general (with or without my change) seems off on real devices (not seeing my own functions in the profiling result), but maybe i'm doing something wrong
@noahsmartin i can't merge, can you do it
new
pointers which also avoids memory leaks)As far as performance goes though, it's largely bottlenecked both before and after this PR by FIRCLSReadMemory, which calls vm_read_overwrite. I imagine it's because it doesn't trust the address to be safe to directly dereference without causing a segfault. It currently takes about 0.1 ms for my test case for a single sampling of the main thread, although it's largely dependent on stack depth I imagine. Replacing it with a memcpy makes it take about 0.04ms, at which point thread_get_state is the next-biggest culprit.