Open encounter opened 3 months ago
Thinking about this change, I would like to merge it in pieces to better understand the different parts. Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?
Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?
Go for it! I'll rebase accordingly once I get time to work on it again.
112953833a9dd8e187d6a4690ba57b6e5be09889 removes all the snapshotting, including the signal handler and web UI for it
I copied your change to add Handler::Async, and one thing I notice about it is we end up with two Boxed futures per async call. The first basically holds the decoded stack args, and then the second calls the first and updates state based on its result. The previous code had only one boxed future because it bundled those two things together. I'm not sure there's a good way to resolve this.
https://github.com/evmar/retrowin32/pull/31 -- my attempt
If handle_shim_call was always async, one Boxed future could be avoided. It’s only used to conditionally return a Future. Maybe it would work to make that function return Option<impl Future>
instead?
edit: I realized that wouldn’t allow us to store it in a Vec, unless we converted it to a custom Future implementation to make it a concrete type.
Hm yeah, I had a similar thought. Unfortunately it's put in Vec stored per-cpu, and the cpu layer doesn't know about shims, hrmm.
In 5c92c7d93f1320b08405fff7a92adbd94f440b1b I was able to simplify the async shim handling. It pushes the responsibility for storing and polling the future up into the top-level event loop (cli or web, web updates in 68221580f023031128aac76a5ef8ced784374cf9). machine.call_shim
returns the BoxFuture<u32>
from the shim directly, instead of wrapping it in another future for updating the registers. The event loop stores this future, polls it, and calls finish_shim_call
when complete, which will then run the machine-specific register updates.
x86-emu and x86-unicorn only have to return StopReason::Blocked
when their EIP=MAGIC_ADDR, which tells the outer event loop to perform future polling. This simplifies their implementations quite a bit.
I haven't had a chance to look at this yet, but I wanted to note my pending builtins-as-dlls work lets us eliminate some of the post-shim code: https://github.com/evmar/retrowin32/pull/30/commits/60b256e4e50404f8e7be9f5fcd09420c437cdef2
basically the new flow is that the original binary does some call [SomeFn]
which shows up at
SomeFn:
syscall ; causes Rust impl to be invoked
ret 12 ; the number here is stack_consumed
The only thing the shim implementation code needs to manage is taking the return value of the Rust fn and putting it into eax.
I haven't quite figured it out yet but I am pretty sure that async fns will benefit similarly.
Now that I've typed this out, maybe the eax handling means this doesn't win anything, hrm.
New idea: make the futures Vec a Vec of Future<Option<u32>>
, where a present value means "put this in eax when done". Coupled with my other change that removes the other code that needs to run after an async block I think that is enough?
Sounds good to me. I think that's similar to the solution I cooked up in the above commit:
struct AsyncShimCall {
shim: &'static win32::shims::Shim,
future: BoxFuture<u32>,
}
let mut shim_calls = Vec::<AsyncShimCall>::new();
// ...
match stop_reason {
win32::StopReason::Blocked => {
// Poll the last future.
let shim_call = shim_calls.last_mut().unwrap();
match shim_call.future.as_mut().poll(&mut ctx) {
Poll::Ready(ret) => {
target.machine.finish_shim_call(shim_call.shim, ret);
Except now finish_shim_call
is only responsible for setting eax, instead of doing the eip and stack manipulation as well.
I am so so sorry this has taken me so long! I have merged pieces of it and I am working on the main debugger part, but it also managed to collide horribly with this dll change I've also been working on for a long time so it's been kind a three way train wreck between your change, my change, and also my personal life stuff.
This implements a working GDB stub for
x86-emu
andx86-unicorn
. One larger change is the new state machine incli/main.rs
. This code is now shared betweenx86-emu
andx86-unicorn
: it allows stopping, resuming and single-stepping the machine emulation, handling breakpoints and machine errors in a unified way.Another larger change is reworking the async shim handling. There's no longer any backend-specific code in
builtin.rs
. (Take a look at the simplified implementation!)x86-unicorn
was updated to use thex86-emu
approach for future handling (usingeip=MAGIC_ADDR
to poll futures instead of executing CPU), though I'd like to consider ideas to unify this code between the two as well, if possible.Other changes:
x86-unicorn
: Unmapped first 4k of memory to catch zero page accesses--trace-blocks
because I didn't feel like reimplementing it (I could, though, if desired)x86-emu
snapshot feature. I broke it while refactoring things to usePin<>
, and decided to rip it out and revisit later if it's still a desired feature. One big issue is that it's incompatible with any running futures.TODO:
x86-64
build--trace-points