Open lunarpapillo opened 2 years ago
My initial thought about A is that if the output capture uses compression, then "seek" and "write" becomes "seek" and "rewrite the block". That doesn't make this impossible, just not as simple as saving ftell
, then after success fseek
, fwrite
, then fseek
back to end.
I'd like something like C, where we cause the crash to continue in the encoder function somehow, encoding a failure result code, and then terminate operation as soon as the ApiCall block is encoded and written. I'm not sure how to do that without setjmp
/longjmp
, though; throwing appears undefined from a signal handler.
I think the fundamental issue here is the intent with GFXR as I understand it is capturing a successful API stream and then replaying of that stream for regression testing, bringup, maybe performance testing, and maybe debugging. This issue is somewhat outside the original scope, although I agree it would be nice to have something useful happen here if its impact is low enough.
I would just reiterate the value of GFXR as a debugging tool. We often ask for GFXR reproductions of VVL bugs or crashes, and I've sent traces to driver developers to reproduce driver bugs. Anything that allows the trace to be used as a test case is a plus from my POV.
I agree it would be a nice thing to have. I'm thinking it may be possible to save off some kind of a "last-chance" call block saved off at the beginning of each interception function and a segfault may be able to write out that block with the compression type for the capture. But that would be a performance hit so I would think we'd make it conditional on an capture environment variable option. Would that work?
To my knowledge, which is admittedly thin, GFXR hasn't ever saved off an API call block for a crashed command, so presumably the captures we get and the captures you send weren't crashes in drivers? Or were the circumstances diagnosable from the calls before the crash?
@bradgrantham-lunarg said:
I think the fundamental issue here is the intent with GFXR as I understand it is capturing a successful API stream and then replaying of that stream for regression testing, bringup, maybe performance testing, and maybe debugging. This issue is somewhat outside the original scope, although I agree it would be nice to have something useful happen here if its impact is low enough.
I think I agree that such use cases are outside of the original scope (and maybe architecture) of gfxreconstruct... but I also agree with @Tony-LunarG :
I would just reiterate the value of GFXR as a debugging tool. We often ask for GFXR reproductions of VVL bugs or crashes, and I've sent traces to driver developers to reproduce driver bugs. Anything that allows the trace to be used as a test case is a plus from my POV.
If gfxreconstruct can capture and trim a crashing frame, it would become the easiest way to isolate driver and layer crashes into actionable data. It would be an invaluable tool for all low-level Vulkan developers, the one must-have accessory in the developer toolbox...
But I also agree that it's hard. As excited as I am by the possibility of raising the utility of this project for one subset of Vulkan users, I understand it may not be worth the ROI to develop, especially if major architectural changes were involved...
My initial thought about A is that if the output capture uses compression, then "seek" and "write" becomes "seek" and "rewrite the block".
Hmmm... I'm no expert, but I thought compression blocks typically collected many commands, and compressed the whole block after the fact and wrote to disk... if that's true, altering the still-in-memory block wouldn't be all that difficult... is my naivete showing?
(Of course, the difficulty of reacting to a crash in a useful and cross-platform way is still difficult.)
Developers would like to be able to capture an application (sometimes involving layers) that provokes a driver crash. The capture would be very useful in determining exactly what went wrong in a debugging situation (particularly on Android, which is difficult to debug otherwise).
Right now, gfxreconstruct will only write to the trace file post-call; right now, this causes the offending command to be lost, as the driver will have crashed before the information would have been saved.
Here are a few brainstormed alternatives for supporting this use case:
A. Two-stage capture writing
The capture layer writes the call data to the capture file before the call is processed, with some indicator that the call has not yet been passed on. After the call returns, the capture layer goes back to the file using
seek()
or equivalent and writes the post-call information into the proper location in the capture file. (If the file format were arranged so that all the pre-call information came first, this could be done without usingseek()
.)gfxrecon-replay
would have to behave correctly when used on a call that has only pre-call information written.gfxrecon-replay
,gfxrecon-toascii
, trimming)B. Trap exceptions during capture
The capture layer could trap exceptions and other crashes, and output in some format information about a the last call to be captured.
:heavy_minus_sign: difficult to detect and react to all crashes on all supported OSes :heavy_minus_sign: crashing command does not go into the capture, so no way to collect this crash as a test case
C: As (B), but write the crashing command to the capture file after the crash