OSVR / OSVR-Unity

Package for authoring OSVR experiences with Unity.
Apache License 2.0
99 stars 38 forks source link

Made ClientKit a persistent singleton to avoid multiple instances whe… #52

Closed DuFF14 closed 9 years ago

DuFF14 commented 9 years ago

…n switching to a scene with a ClientKit.

DuFF14 commented 9 years ago

Yes. Pull request opened.

rpavlik commented 9 years ago

Did you test this in both the editor and compiled builds, including a multi-scene project?

DuFF14 commented 9 years ago

Actually, when I tested this last night I didn't have an HMD with me. Switching between scenes in the editor (server running without HMD) didn't cause any issues. I was looking for whether or not this ensures one instance of ClientKit, which it does.

Now with an HMD attached, this bug is still occuring where on the third scene change we get a null reference exception and latency:

NullReferenceException: Object reference not set to an instance of an object OSVR.Unity.InterfaceCallbacks.PoseCb (IntPtr userdata, OSVR.ClientKit.TimeValue& timestamp, OSVR.ClientKit.PoseReport& report) (at Assets/OSVRUnity/src/InterfaceCallbacks.cs:209) OSVR.ClientKit.ClientContext.update () OSVR.Unity.ClientKit.FixedUpdate () (at Assets/OSVRUnity/src/ClientKit.cs:118)

This error occurs with or without the changes in this pull request. This change does prevent multiple instances of ClientKit objects accumulating when each scene loads. One time it did crash the Unity editor, but most of the time it still runs with high latency. Here is that crash log: https://gist.github.com/DuFF14/3f3864f7b2c33b713529

DuFF14 commented 9 years ago

So I still think this can be merged, but it doesn't fix the bug when switching between scenes.

rpavlik commented 9 years ago

I'm unsure why it would matter whether there is an hmd attached, but I'm further concerned to make sure we don't introduce regressions. I'd rather merge in the workaround sent to Radial-G (that fixes the crash) while this is polished.

DuFF14 commented 9 years ago

When the Radial-G team ran into this bug initially, the workaround at the time was to make ClientKit not persist between scenes, and create a new ClientKit in a each scene. We thought that this error was related to https://github.com/OSVR/OSVR-Core/issues/130 and https://github.com/OSVR/Managed-OSVR/issues/9, but apparently there is more to it. I am fine with holding off until we have more information on this -- do whatever works to fix the crash, but I do think this is an improvement over the existing singleton implementation which does not limit the ClientKit to one instance.

JeroMiya commented 9 years ago

Is it common to call GetComponent<T> within an update step in Unity, or should we be lazy loading those into a backing field and holding onto them? I see three calls to GetComponent<DisplayInterface>() in VRHead.update. That might help a little with the GC pressure, but I suspect something else is going on.

If you are already using the version of Managed-OSVR with the SafeHandles implementation, you will need to call Dispose on your OSVR.ClientKit.Interface instances from any unity objects that manage the interface's lifetime. For example, I see that OSVR.Unity.PositionInterface just sets its iface instance to null in OnDestroy. It needs to call iface.Dispose first. Otherwise the unmanaged interface instance will stay around until the finalizer runs on the SafeClientInterfaceHandle, which for Unity's runtime might be never, or when the game closes, etc... SafeClientInterfaceHandler.Dispose results in osvrClientFreeInterface being called, so if you're not calling Dispose, the interface isn't getting freed and the callbacks will continue (resulting in a memory leak and increasing lag from the accumulated callbacks).

@rpavlik I think that comment still applies, but I think what he meant to say was to explicitly free up unmanaged resources (either with Dispose calls or other means) within the OnDestroy call, and not wait for finalizers to run.

rpavlik commented 9 years ago

I don't know how common GetComponent<T> is - though IIRC Unity 5 phased out a few pre-made cached variables for game objects in favor of just using get component (e.g. for camera).

We're still using the old integrated version of the Managed-OSVR: haven't gotten separate NuGet packages for the binaries made (for use with nuget native, squirrel updater, and other systems - and to decouple the core and binding builds/versions), haven't set up the CI to build said packages and the managed package, and haven't pulled in the nuget package into the Unity build system (though that part shouldn't be hard, swap a command-line copy for a command-line nuget invocation)

JeroMiya commented 9 years ago

Prior to the SafeHandle implementation, you would have needed to ensure that osvrClientFreeInterface was called, passing the IntPtr of the native ClientInterface. Just took a quick look at the code on my phone browser (sorry if I miss something) and couldn't see any references to osvrClientFreeInterface. As a temporary fix prior to the SafeHandles update, maybe an update to OSVR.Unity.InterfaceCallbacks.OnDestroy to free the native interface?

DuFF14 commented 9 years ago

I agree that GetComponents shouldn't be in Update loops. I added a commit to store a DisplayInterface rather than call GetComponent in updates. There may be more instances of that in the project that I can change in another branch.

I built the latest version of Managed-OSVR so I could use the SafeHandle implementation. Added a call to iface.Dispose() in InterfaceCallbacks Stop() which gets called by OnDestroy(). https://github.com/OSVR/OSVR-Unity/blob/testSingleton/OSVR-Unity/Assets/OSVRUnity/src/InterfaceCallbacks.cs#L74 And added a call to the Stop function from InterfaceGameObject: https://github.com/OSVR/OSVR-Unity/blob/testSingleton/OSVR-Unity/Assets/OSVRUnity/src/InterfaceGameObject.cs#L142

This still results in null ref exceptions (here: https://github.com/OSVR/OSVR-Unity/blob/testSingleton/OSVR-Unity/Assets/OSVRUnity/src/InterfaceCallbacks.cs#L216) after the second scene loads. Am I missing anything here? Pushed a testSingleton branch if anyone can try it. Keys 1 and 2 switch scenes (that code is arbitrarily in VRHead::Update).

Radial-G team suggested adding UnregisterCallback functions for each RegisterCallback. Is the SafeHandle implementation meant to take care of that?

JeroMiya commented 9 years ago

I'm able to reproduce the null reference exception by setting the callbacks to null and waiting for a garbage collection. Calling osvrClientFreeInterface doesn't seem to help. I thought it was a bug in the native code:

https://github.com/OSVR/OSVR-Core/issues/130

Here is an updated version of the code that reproduces the null reference exception:

    class AnalogCallback
    {
        static void analogTrigger_StateChanged(IntPtr /*void*/ userdata, ref TimeValue timestamp, ref AnalogReport report)
        {
            Console.WriteLine("Got report: channel is {0}", report.state);
        }

        static void Main(string[] args)
        {
            using (OSVR.ClientKit.ClientContext context = new OSVR.ClientKit.ClientContext("com.osvr.exampleclients.managed.AnalogCallback"))
            {
                var iface = context.getInterface("/controller/left/trigger");
                var callback = new OSVR.ClientKit.AnalogCallback(analogTrigger_StateChanged);
                iface.registerCallback(callback, IntPtr.Zero);
                iface.Dispose();
                callback = null;
                GC.Collect();
                while(true)
                {
                    context.update();
                }
            }
        }
    }

The null reference exception happens while in the native code, in the osvrClientUpdate method.

Now, note how the following does NOT crash:

    class AnalogCallback
    {
        static void analogTrigger_StateChanged(IntPtr /*void*/ userdata, ref TimeValue timestamp, ref AnalogReport report)
        {
            Console.WriteLine("Got report: channel is {0}", report.state);
        }

        static void Main(string[] args)
        {
            using (OSVR.ClientKit.ClientContext context = new OSVR.ClientKit.ClientContext("com.osvr.exampleclients.managed.AnalogCallback"))
            {
                var iface = context.getInterface("/controller/left/trigger");
                var callback = new OSVR.ClientKit.AnalogCallback(analogTrigger_StateChanged);
                iface.registerCallback(callback, IntPtr.Zero);
                iface.Dispose();
                GC.Collect();
                while(true)
                {
                    context.update();
                }
            }
        }
    }

The difference is that I am no longer setting callback to null. However, also notice how the callback is still getting called (when you press the left trigger on a hydra a bit). It shouldn't be, because the iface.Dispose call is calling osvrClientFreeInterface. I verified this with a breakpoint, but you can also see it in the trace output:

[OSVR] ClientKit assembly directory: C:\Users\Jeremy\JeroMiya\Managed-OSVR\build\bin\net20\Debug
[OSVR] ClientKit DLL directory: C:\Users\Jeremy\JeroMiya\Managed-OSVR\build\bin\net20\Debug\x64
[OSVR] In Interface.Dispose()
[OSVR] In Interface.Dispose(True)
[OSVR] Interface shutdown

My theory was that osvrClientFreeInterface was somehow not actually unregistering the callbacks for the interface or otherwise not freeing it. Then, the callbacks end up getting called, but the managed callback has already been garbage collected. That seems consistent with the behavior I'm seeing. However, @rpavlik also noted that he is able to get the osvrClientFreeInterface code to work properly in unmanaged code. Is something not right in the C wrappers, specifically osvrClientFreeInterface?

rpavlik commented 9 years ago

So I became a bit uncomfortable with the assertions I had made without automated tests. I now have tests in https://github.com/OSVR/OSVR-Core/pull/159 for context creation/destruction - fine with multiple context, whether they fully overlap, partially overlap, or don't overlap at all in lifetime. Will add similar tests for interfaces and callbacks.

rpavlik commented 9 years ago

OK, please verify this with the latest master branch, now that we're using external Managed-OSVR with safe handles, etc. You may want to rebase this branch or merge master into it.

Is there some way we can set up an automated test of the two-scene crash?

DuFF14 commented 9 years ago

Merged with master. Not sure what you mean by automated, but here is the branch I've been using to test: https://github.com/OSVR/OSVR-Unity/tree/testSingleton

Keys 1, 2, 3 switch between scenes. I was just able to get this working for a few scene changes by stepping through with the debugger. Still tracking this down.

rpavlik commented 9 years ago

I mean something that we can run from the command line during the build process to verify that the bug isn't there. (presumably triggered from an editor script) A quick google with little further investigation pulled up some old posts, not sure if they're the latest recommended practices. (Helpful to search for 'unity3d whatever' instead of just 'unity whatever' since Unity is an awfully popular name for stuff, including software)

DuFF14 commented 9 years ago

Will look into automated tests. One thing I've discovered is that switching between scenes is fine if the only tracker in each scene is VRDisplayTracked. Switching from scenes with controllers such as minigame.unity and OSVRDemo.unity are causing errors. In the test branch, I have 4 demo scenes now. OSVRDemo3 is identical to OSVRDemo2 with a different app id. OSVRDemo4 is identical to OSVRDemo, without LeftHand and RightHand. Switching between scenes OSVRDemo2,3,4 works without errors. Switching from OSVRDemo to any of the other scenes throws null reference exceptions in the new scene.

edit: Note that I don't have a Hydra attached, just an HMD.

rpavlik commented 9 years ago

Note that in this case, I'm looking for a regression test (make a minimal test case that reproduces a bug in an automated way, script it to make sure we can detect if it returns)

DuFF14 commented 9 years ago

Ok, merged in the most recent improvements from @JeroMiya. I can no longer reproduce the bug that was occurring when switching scenes :+1: This is again ready for a review/merge. Have tested that only one ClientKit persists between scenes.

rpavlik commented 9 years ago

Do you have an automated test for the scene switching? Should waiting for such a test block a merge? (Or rephrased, does this fix a bug so bad that it's worth doing the merge even though there aren't automated tests for it yet?)

DuFF14 commented 9 years ago

Will work on an automated test today, but I don't think it should block a merge because this pull request wasn't affecting the scene switching bug as I originally thought, which was fixed in https://github.com/OSVR/OSVR-Unity/pull/57 and https://github.com/OSVR/Managed-OSVR/pull/10 by properly disposing interfaces.

rpavlik commented 9 years ago

OK, have split out the test into #62 - will merge this.

rpavlik commented 9 years ago

(Does this need a readme update or a changes entry?)

DuFF14 commented 9 years ago

Perhaps a note (probably in Changes) that a bug was fixed where multiple ClientKits would accumulate when switching between scenes or reloading a scene with a ClientKit object. Now there will only be one ClientKit instance that persists between scenes, as originally intended.

JeroMiya commented 9 years ago

Do your automated tests excercise the callback apis? Note that OrientationInterface, PositionInterface, and PoseInterface now use the state apis. The button and analog sample scripts use the callback. You may also need to trigger a GC.Collect explicitly after a scene change to reproduce the callback issues I was experiencing a while back.

rpavlik commented 9 years ago

We don't yet have automated tests - that's a separate issue at the moment. Would be ideal to have some, of course.