[draft / demo] Add network-enabled trace logging / add compact binary tracelog format

devinacker / bsnes-plus

debug-oriented fork of bsnes

328 stars 92 forks source link

Hi! Thank you all for your work on BSNES-plus, it's an amazing tool.

I started working on tracelog modifications for BSNES to allow realtime streaming of trace data to outside apps. I originally thought this would be a throwaway hack, but, decided to clean it up and post it here in case others thought it was useful.

This code I would consider demo-only, for merging into upstream BSNES it needs some additional finishing like making it play nicer with the UI and exposing some config options. If there's interest in my approach, I can refine this further for upstream submission.

I was working on disassembly research for a 4MB ROM, and BSNES's tracelog capabilities were extremely useful. However, even a short run would produce gigantic gigabyte-sized dumps, and would cause BSNES to run at ~2FPS. The workflow for doing all this manually with an external tool was tedious as well.

This PR does 2 main things:

When you click the 'Trace' checkbox in the debugger, BSNES will open a socket to listen on TCP port 27015. When a client connects, BSNES will dump tracelog data over the socket in near-realtime. This approach uses a couple worker threads to create batches of instruction traces, compress them with zlib, and send over the network.
Adds a new binary tracelog output format option. This binary format uses 8 bytes to describe each instruction trace (vs text version is ~80 bytes and does lots of slow per-instruction text formatting).

When both options are combined, tracelogging [even with tracemask turned off] now runs in (basically) realtime and you can play through a game or run a movie while fully logging everything with an external tool.

I have been working on modifications to an existing external tool called Diztinguish, which is a GUI for automating some parts of the disassembly process by marking an input ROM as code vs data, and marking the code with metadata about the state of the X,M,DB,D flags to enable accurate disassembly. Once the code has been marked up, Diztinguish can export source .asm files with labels and markup that can be compiled back into a byte-identical version of the original ROM.

The changes in this PR allow Diztinguish to talk to BSNES-plus: https://github.com/Dotsarecool/DiztinGUIsh/pull/18

Here's a demo of what this entire system looks like running together. On the left is BSNES-plus, which is talking via socket to Diztinguish on the right. The areas in yellow showing up are 658c16 instructions being marked up in realtime as the game runs. This is a sped up video of about a 1 minute run (full version is here: https://www.youtube.com/watch?v=NCZUESf82Rg&feature=youtu.be)

ezgif com-gif-maker

Below is a highlight of Diz marking up the ROM in realtime, pretty neat visual:

ezgif com-gif-maker (1)

Future work:

[ ] If interested in having this upstream, make the socket reader code play nicer with the GUI (currently I do some dumb things that block the entire GUI while waiting for a client to connect)
[ ] Port to something non-windows specific?
[ ] Is this architecturally the best way to do this? Should it be a BSNES plugin instead?
[ ] Investigate if socket is the best IPC or if something more direct might be better (like shared memory / memory mapped file / pipes / non-TCP sockets). Socket code was what I just had laying around.
[ ] Add support for sending different kinds of data down the stream, like dumping info about reads from ROM addresses (so we can mark graphics/data/etc in Diztinguish)

This PR adds a few files back in the embedded zlib source files included in BSNES (for compression functionality). They are unmodified from the original zlib v1.2.3 source.

Cool! Yea no rush.

I realized I forgot to document a couple of things with this. There's a couple constraints that were made as performance tradeoffs, some of them might also be backed off. My goal was to make this really fast while also being somewhat expandable (for instance I hope to hook the "usage" system into this).

Some alternative ideas instead of using sockets for external communication: 1) Use named pipes or shared memory. If I was starting over, I might have chosen this, I thought it might be cool to be able to use this over a real network, I haven't tested if it's fast enough over a real network. 2) Use UNIX Domain sockets (now in Win10). Basically same thing but avoids the TCP handshake stuff 3) Scrap the network part of this and instead create the plumbing to have a plugin talk via DLL interface to launch an external tool (like the one I am working on, DiztinGUIsh).

Overall, it works pretty good though. I think dumping most of this into a plugin where it can be optionally compiled/enabled might be the way to go.

Here's some more info on the pipeline:

For every instruction executed, 8 bytes of trace info are generated: SNES address (3 bytes), number of bytes used by opcode+operands (1 byte) D register (2 bytes) DB register (1 byte) flags (1 byte)

These 8 bytes are the smallest "abridged" tracelog format, but, you can enable "full" tracelog (about 20 bytes) and it will dump the rest of the info on registers A, X, Y, S, e flag, and 4 bytes for the actual instruction.

We append 2 header bytes at the beginning for: ID (1 byte) Size of data that follows (1 byte)

As a first pass, I tried just having the main thread throw those 10 bytes down the socket as they came in, but, it was way too slow and even the non-blocking socket send call was too slow.

So I ended up profiling and iterating the design, came up with this which seems to be working great, even if it's a bit more complex than I was hoping for.

1) Main thread takes a few thousand of these small 10-byte chunks and appends them to a fixed size buffer. 2) When that buffer is full, it's put on a queue for the compression thread which uses zlib to compress the buffer (~75% reduction in size) 3) The compressed buffer is passed onto the sending thread 4) The sending thread takes compressed buffers and sends over the socket

This works fine on localhost because Windows seems to handle arbitrary packet lengths fine (and we send about 25k in one packet). If it was sent on a real ethernet network, it probably would fragment instead and that could potentially destroy performance. I think (not sure though) we'd have to tune our compressed packet size to be around 1500 bytes to maintain performance, which might require some other tweaking.

Other note: the organization of the code is pretty sloppy, I shoved all this in a new file called w32_socket.h which is poorly named and contains classes that do 3 different things.

So basically, let me know if you think this is something you'd like to see cleaned up and integrated into BSNES-plus upstream, or, if this is better off as a fork or a separate plugin. I'm new to SNES-specific stuff, so if I'm doing anything silly there, let me know. Also interested if this idea of remote interfacing with the CPU guts should be thought out at a higher-level layer like libsnes or something, I'm kind of new and unfamiliar with the broader ecosystem.

Thanks!

devinacker / bsnes-plus

[draft / demo] Add network-enabled trace logging / add compact binary tracelog format #268