flibitijibibo / FNA-MGHistory

FNA - Accuracy-focused XNA4 reimplementation for open platforms
http://fna-xna.github.io/
246 stars 37 forks source link

Debug Effect XNB file works, release XNB file crashes with AccessViolationException #306

Closed elisee closed 9 years ago

elisee commented 9 years ago

Hi! I just caught up with all the FNA changes of the last few months. Great work getting XNB effect support merged, getting rid of SDL_mixer and all the rest :)

When applying the only pass of an Effect with effect.CurrentTechnique.Passes[0].Apply();, I'm hitting an AccessViolationException in OpenGLDevice.cs line 1550. This only happens when using the release version of my Effect XNB.

I didn't even know that XNB files differ when built in Debug vs Release mode, but the Debug one is 6 KB and has a lot more plain text than the Release one (2 KB). I've looked around for signs that FNA only supports Debug Effect XNBs but I've seen no mention of that in the code or wiki. At the very least, the EffectReader seems to parse the effect just fine.

Here's the .fx source as well as both the Debug and Release XNB files.

elisee commented 9 years ago

OK so as I was writing this bug report, I did some more tests. This looks like an heisenbug of sorts. Maybe some race condition with garbage collection, like we had elsewhere?

In a pretty minimal test project, it only crashes when running with the debugger and in Debug mode. When starting the generated EXE from explorer, it runs fine. And when running with the debugger in Release mode, it also runs fine.

But in the context of my actual (beefy?) app, it seems to crash every time (but only with the Release-mode XNB, though). The effect pass is only applied after another effect was active in this context so I'll try and see if doing that in my minimal project lets me reproduce the crash everytime. Will update with results.

flibitijibibo commented 9 years ago

While you're looking at that I can elaborate a bit more on the Effect binary formats:

Basically the only difference between Debug/Release is that Debug does not compress it down with LZX, so when you unpack the Release binary it should look exactly the same as the Debug bin.

As far as this specific bin, MEPTool parses it out just fine, so we're probably looking at something weird like you said.

flibitijibibo commented 9 years ago

Actually wow, I think we're looking at the Vorbis problem again, you're right on the GC in a way:

This might be a lot of work to try, but can you change effectChanges to be an IntPtr that's directly malloc'd to the sizeof the stateChanges struct? Then instead of passing stateChanges as a ref, we'll just pass an IntPtr and parse out the changes with an unsafe block. That, or we need to find a way to pin the stateChanges so it doesn't get moved around by the damned VM like it was the OggVorbis_File...

flibitijibibo commented 9 years ago

For clarity (or in case anyone else wants to check this out): These are the two options we're looking at:

  1. Try to pin the stateChanges structure so the runtime doesn't move memory around: https://github.com/flibitijibibo/FNA/commit/5961a678b11738730d6826f8a5eccd8c60700ca8
  2. Tell managed languages they're fired and switch to malloc: https://github.com/flibitijibibo/Vorbisfile-CS/commit/a08ca774a11def830fa6038a9bbf14cbdce62ded
elisee commented 9 years ago

OK, thanks for the pointers. I'll look at this either tonight if I find the time or this week-end.

flibitijibibo commented 9 years ago

Here's a patch to try option 1:

http://flibitijibibo.com/changesPin.diff

If this doesn't work then we get to play the malloc game. :(

elisee commented 9 years ago

Applied it and rebuilt all but it doesn't seem to make a difference. Getting the same AccessViolationException in the same spot. The malloc game it is.

flibitijibibo commented 9 years ago

Oh, very well...

In case someone manages to debug this at the native level, here are the two native functions you care about:

https://github.com/flibitijibibo/MojoShader/blob/master/mojoshader_opengl.c#L2708 https://github.com/flibitijibibo/MojoShader/blob/master/mojoshader_opengl.c#L2780

flibitijibibo commented 9 years ago

One more patchset, this one's a bit more complicated and affects MojoShader#:

http://www.flibitijibibo.com/changesMalloc1.diff http://www.flibitijibibo.com/changesMalloc2.diff

elisee commented 9 years ago

With the those two patches applied, I get the following:

So, huh... there are probably more objects that need pinning / to be malloc'ed?

flibitijibibo commented 9 years ago

I doubt it; the only things that get touched in those two native functions are the GLEffectData/EffectData, which are allocated by MojoShader, and a couple bits in the glContext, also allocated by MojoShader. The effectStateChanges structure was the only one handled by managed code.

flibitijibibo commented 9 years ago

To cut out a possible issue, here's a version of the effect built with preshaders disabled:

http://www.flibitijibibo.com/DefaultNOPS.fxb

Replace the XNB with this (keeping the fxb extension) and we'll try to load this one instead.

Aside from the stuff I mentioned earlier we also have a preshader VM that tries to run this stuff. It's run everything I've put through it up to this point, but maybe this is what's killing us.

elisee commented 9 years ago

No crashes at all in any of the configurations I tried earlier with this (My test triangle doesn't draw though).

flibitijibibo commented 9 years ago

Hm, so I think the preshader VM has trouble with the DOT and MUL instructions. From MEPTool:

PRESHADER:
SYMBOLS:
    * 0: "FogMaxDistance"
      register set float4
      register index 6
      register count 1
      symbol class scalar
      symbol type float
      rows 1
      columns 1
      elements 1
    * 1: "FogMinDistance"
      register set float4
      register index 5
      register count 1
      symbol class scalar
      symbol type float
      rows 1
      columns 1
      elements 1
    * 2: "InvWorld"
      register set float4
      register index 0
      register count 4
      symbol class column-major matrix
      symbol type float
      rows 4
      columns 4
      elements 1
    * 3: "LightDirection"
      register set float4
      register index 4
      register count 1
      symbol class vector
      symbol type float
      rows 1
      columns 3
      elements 1

dot r0.w, c4.xyz, c3.xyz
dot r0.xyz, c4.xyz, c0.xyz
dot r0.yzw, c4.xyz, c1.xyz
dot r0.zw, c4.xyz, c2.xyz
dot r1, r0, r0
rsq r0.w, r1.x
mul c12.xyz, r0.w, r0.xyz
neg r0.x, c5.x
add r1.x, r0.x, c6.x
rcp c13.x, r1.x

The problem (I think) is that we're not handling the use of registers that use more than one element (i.e. xyz rather than just x). Here's where MUL and DOT are implemented:

https://github.com/flibitijibibo/MojoShader/blob/master/mojoshader_effects.c#L151 https://github.com/flibitijibibo/MojoShader/blob/master/mojoshader_effects.c#L169

EDIT: Well, we are handling them, but maybe we're not doing it right... MUL dst elements are one example. Either way, not sure where the crash comes from.

As for the triangle, I may need a test case for that one to try locally.

elisee commented 9 years ago

Here's Game1.cs, no external content required except for the compiled effect file Default.xnb or .xfb.

flibitijibibo commented 9 years ago

Well, I can definitely confirm that the preshader VM is what's crashing... it claims to crash on an unhandled opcode. I think I'm going to focus on that first... for all I know it'll fix the triangle too.

elisee commented 9 years ago

Alright, cool. I tried to setup a MojoShader build with VS2010. Got CMake to generate the project, but build fails because... SDL2 is missing? Not sure if I need the whole SDL2 source tree or some header files are enough.

Anyway, maybe you've got everything you need to track down the problem locally now? But definitely let me know if getting a MojoShader stacktrace on my end would help.

flibitijibibo commented 9 years ago

Got the MojoShader crash, this shouldn't have worked before anyway. Turns out it's Ryan that's fired this time! :P https://github.com/flibitijibibo/MojoShader/commit/a39a6cfe05424c87ba98bb89cbbfe24d115a78c2

Triangle's still missing though. I'll rebuild MojoShader now, but yeah, that's one problem dealt with.

flibitijibibo commented 9 years ago

fnalibs.tar.bz2 has been updated. At this point I would now look at apitraces to see if the constant buffers are wrong or if something else has broken things between MGFX and now.

elisee commented 9 years ago

I reverted to stock FNA with the latest fnalibs.tar.bz2 and my original Default.xnb and I get the OpenGLDevice.cs line 1550 crash I initially reported. So I applied changesMalloc1.diff and changesMalloc2.diff again and the first Game1.Draw() call succeed with the following result:

image

Not exactly a triangle :D but better than nothing. On the second Game1.Draw() call though, effect.CurrentTechnique.Passes[0].Apply(); now triggers a crash at the following location:

https://github.com/flibitijibibo/FNA/blob/9d6f3f655c4393135c549ecb1d71b83fdb60043b/src/Graphics/OpenGLDevice.cs#L1529

I guess this warrants a fix before looking at apitraces?

flibitijibibo commented 9 years ago

How does the latest fnalibs.tar.bz2 fare with DefaultNOPS? Right now I'm looking at the preshader DOT results and it's a lot of NaN, so there might still be a problem with the preshader VM right now (which is probably why it's not quite a triangle...)

The CommitChanges crash is a whole different mystery though, though it may still be a problem with the preshader ops.

elisee commented 9 years ago

Right, should have mentioned that: with DefaultNOPS.fxb, it doesn't crash but draws no triangle. I'll look into getting an apitrace with it. No idea how complicated/easy it is to setup.

elisee commented 9 years ago

OK, so incredibly easy then :). Nifty tool! https://www.dropbox.com/s/xhpzdzw0au3q7ff/SimpleGameDefaultNOPS.trace?dl=0

EDIT: Would love to help more but I have no idea what to look for.

flibitijibibo commented 9 years ago

Did one more update to MojoShader to fix possible write errors (which may have crashed CommitChanges):

https://github.com/flibitijibibo/MojoShader/commit/0f461066e772e0bb185606e9ca347a9581078381

Since I remember now: To build the current MojoShader branch you need the SDL headers somewhere - the sources don't matter, we're just using SDL_vsnprintf to prevent locale issues, and I've bundled that in the MojoShader branch. If you've got the headers somewhere, just throw them in SDL2/ at the repo root folder and it should work without fuss.

Looking at both of our traces I'm also kind of baffled at the vertex input... check out the glVertexAttribPointer calls. For some reason the last texcoord is missing?!

The uniforms from the preshader calc are still wrong I think; that one MUL operation that does mul c12.xyz, r0.w, r0.xyz should at worst give us all 0s, but we're getting NaN instead. The c13 result is right though; that actually should be inf, funnily enough.

elisee commented 9 years ago

Since I remember now: To build the current MojoShader branch you need the SDL headers somewhere

Ah, well, that's what I tried (downloaded the SDL 2.0.3 dev headers from libsdl.org) but I ended up with 408 build errors:

image

Maybe some compiler options to tweak?

flibitijibibo commented 9 years ago

Hm, well, I've got an even better idea, Mr. @elisee...

Without the preshader, what happens when you add this before applying:

effect.Parameters[ "ColorMod" ].SetValue( new Vector4(1.0f, 1.0f, 1.0f, 1.0f) );

Because after that this works for me with upstream MojoShader and the preshader. Is the triangle being set to 0,0,0,0 color why the triangle isn't showing up? :P

elisee commented 9 years ago

Erm. Woops haha. It does work with a proper value for ColorMod :D.

XNA renders the triangle just fine without this line though, so would that mean XNA is somehow initializing ColorMod to a non-zero value? Maybe there's a behavior to match or maybe just lucking out on uninitialized memory?

EDIT: Actually, setting it to effect.Parameters[ "ColorMod" ].SetValue( new Vector4(0f, 0f, 0f, 0f) ); also fixes it with FNA, so it looks like it's an initialization thing that XNA does and FNA (MojoShader?) doesn't. Scratch that, didn't change the startup project

flibitijibibo commented 9 years ago

I would think it's uninitialized memory, since the float4 value in the effect is never initialized, so in the effect binary itself the default values are just a bunch of 0s. That is, unless the weird blob of data before the effect in the XNA version does something behind our back. This blob here:

https://github.com/flibitijibibo/MojoShader/blob/master/mojoshader_effects.c#L809

A hex editor and the Debug version of the binary will be able to say if anything's interesting in there. In the meantime, I'll upload this new version of MojoShader...

elisee commented 9 years ago

OK... this is embarrassing :). Turns out in the XNA version, I had a locally modified version of the effect with a hard-coded return value for the pixel shader (I probably did that precisely because I forgot to set ColorMod and wasn't paying attention -_-). Sorry for confusing things! No observable behavior differences between FNA and XNA after fixing that.

So I guess I'll try the updated version of MojoShader with the original XNB when you've uploaded it and if it doesn't crash and looks like a triangle, we're all set!

flibitijibibo commented 9 years ago

fnalibs updated one more time. Try this with stock FNA and the malloc'd FNA and see if it holds up with the original... I doubt it will, but I've honestly lost track of all the bugs we had today.

elisee commented 9 years ago

Tested the simple triangle game with and without malloc'd FNA: no crashes, everything looks like it should. Now trying out the updated libs with CraftStudio to see if it holds up.

EDIT: Yup, works great with new MojoShader.dll and stock FNA. Looks like the two MojoShader fixes are enough! :tada:

flibitijibibo commented 9 years ago

Hawt. I'll keep those patches up just in case, but those were two really subtle MojoShader bugs... glad they got fixed!

elisee commented 9 years ago

Thought I'd mention this: I just tested Spherunner, one of the biggest CraftStudio games. It uses big voxel maps with chunk geometry generated in a thread. It used to crash with AccessViolationExceptions in FNA 3 months ago, now it works great. Awesome job with everything, FNA is the bomb :)