Unity-Technologies / arfoundation-samples

Example content for Unity projects based on AR Foundation
Other
2.98k stars 1.11k forks source link

AR Foundation iOS massive slowdown when using AsyncOperation #1113

Open VanIseghemThomas opened 8 months ago

VanIseghemThomas commented 8 months ago

Unity bug report case number IN-58003

Describe the bug When disposing a AsyncOperation on Unity 2022.3.11f, this causes a massive slowdown.

image

image

To Reproduce Steps to reproduce the behavior:

Call the following as a coroutine

private IEnumerator ProcessCameraImage(XRCpuImage image, Action callback = null){
          // Create the async conversion request
          request = image.ConvertAsync(new XRCpuImage.ConversionParams{
              inputRect = new RectInt(0, 0, image.width, image.height),
              outputDimensions = new Vector2Int(image.width, image.height),
              // Color image format
              outputFormat = TextureFormat.RGB24,
              // Flip across the Y axis
              transformation = XRCpuImage.Transformation.MirrorY
          });

          while (!request.status.IsDone())
              yield return null;

          // Check status to see if it completed successfully.
          if (request.status != XRCpuImage.AsyncConversionStatus.Ready){
              // Something went wrong
              Debug.LogErrorFormat("Request failed with status {0}", request.status);
              // Dispose even if there is an error.
              request.Dispose();
              yield break;
          }

          rawData = request.GetData<byte>();

          // Create a texture if necessary
          if (cameraTexture == null){
              cameraTexture = new Texture2D(
                  request.conversionParams.outputDimensions.x,
                  request.conversionParams.outputDimensions.y,
                  request.conversionParams.outputFormat,
                  false);

          }

            // Copy the image data into the texture
            cameraTexture.LoadRawTextureData(rawData);
            cameraTexture.Apply();

            // Even though the same variable is used to store the disposables, they will internally still leak memory
            // So we need to dispose them manually
            request.Dispose();
            rawData.Dispose();
            callback?.Invoke();
        }

This is stripped down code of what I'm actually doing, so might have some issues. The XRCpuImage is grabbed somewhere else and passed to this function.

Expected behavior Dispose should not cause such a drastic slowdown. This was not a problem when building in 2022.3.8f1 on Windows 11

Actual behavior A clear and concise description of what actually happened.

Smartphone (please complete the following information):

EDIT: The problems does not seem to be the Dispose call but the Convert operation. Not sure why the profiler is reporting something else here. See the thread for more information

VanIseghemThomas commented 8 months ago

Update: confirmed this is not editor version related. Same behaviour is present when downgrading the project to 2022.3.8f1. So it seems there is a difference in how the Windows editor and MacOS (M2) editor to build the project.

VanIseghemThomas commented 8 months ago

Update 2: Also confirmed that the MacOS XCode project is different from the Windows one since it does not contain any warnings where this is the case for Windows. So there is in fact a difference between the two.

andyb-unity commented 8 months ago

Hi @VanIseghemThomas, we are aware of other issues when building AR Foundation apps for iOS from a Windows Editor. I recommend that you build for iOS from macOS machines for now. Thanks for filing this bug though-- we'll see how QA prioritizes this issue as it makes its way through the process.

VanIseghemThomas commented 8 months ago

Hi @VanIseghemThomas, we are aware of other issues when building AR Foundation apps for iOS from a Windows Editor. I recommend that you build for iOS from macOS machines for now. Thanks for filing this bug though-- we'll see how QA prioritizes this issue as it makes its way through the process.

Hi @andyb-unity, thanks for reaching out. The problem is actually the opposite. We have no problems with the project that is built on Windows funny enough. It's the project built on MacOS that causes the above mentioned issue.

andyb-unity commented 8 months ago

Oh I see. This is the first I've heard of anything like this. Thanks for the detailed report.

VanIseghemThomas commented 8 months ago

No problem, thanks for the quick response! Maybe it's worth mentioning I'm also using the Apple Silicon version of the Unity editor. Haven't tried running under Rosetta. Will see if this makes a difference when I get into the office on monday.

VanIseghemThomas commented 8 months ago

Update: Same issue building under Rosetta.

VanIseghemThomas commented 8 months ago

Found the native call under com.unity.arkit@5.1.0/Runtime/Camera/ARKitCpuImageApi.cs on line 220.

[DllImport("__Internal", EntryPoint = "UnityARKit_CpuImage_DisposeAsyncRequest")]
public static extern void DisposeAsyncRequest(int requestHandle);

Unfortunately I cannot figure out where the external call is made to. I figure it's to libUnityARKit where the source is not available. So kind of stuck here for now.

For now I'll try and see if I can offload this Dispose call to another thread or something. Not sure what the best way of working around this would be yet.

VanIseghemThomas commented 7 months ago

Bump, any updates on this?

VanIseghemThomas commented 6 months ago

Update: revisited this and tried to fix this in our project, seems the profiler isn't telling the real story here.

To debug this I went and did the old Debug.Log with stopwatches on all my operations. Narrowed it down to the ConvertAsync call itself using the one that accepts a callback as parameter to get rid of the ugly while loop hack like documented.

When deploying the application to the iPad and doing nothing but running ConvertAsync on the frames coming in, the processing time as reported by System.Diagnostics.Stopwatch starts at about 70ms. Then when running the application for a while, this only increases to around 170ms after 10 minutes. This is very odd behaviour. Also yes, there are no memory leaks.

I'll be trying the regular Convert method now as a workaround and see how it affects performance. If this is still broken, my solution will be to ditch XRCpuImage as a whole and just process the render texture used to display the camera feed.

VanIseghemThomas commented 6 months ago

Solved by using the Convert method instead of ConvertAsync. Takes about 15ms which is makes me conclude that ConvertAsync is in fact broken.

andyb-unity commented 6 months ago

I'm not initially concerned about the async overload taking longer-- that's the purpose of async, to schedule the work onto a background thread instead of blocking the main thread. I do find it concerning that it takes longer over time.

If you file a new bug with this updated info, I'll see if we can bypass initial QA review and get it assigned to us. Apologies that QA still hasn't responded to your October bug report.

tdmowrer commented 6 months ago

Although I no longer work at Unity, I originally implemented the conversion functions and may be able to help explain.

Dispose only frees memory and other resources, and it should be virtually instantaneous. If I'm reading this thread correctly, you initially thought that it was taking significant time, but you were either timing the wrong thing, or there is some other issue with the profiler. Is that right?

When deploying the application to the iPad and doing nothing but running ConvertAsync on the frames coming in, the processing time as reported by System.Diagnostics.Stopwatch starts at about 70ms.

As Andy says, ConvertAsync uses a single background thread, while Convert uses all available hardware threads. The timings you reported (Convert=15ms, ConvertAsync=70ms) is a ~4.5x difference, which is about what I would expect on an iPad. At the time I implemented it (cira 2018), the image sizes were small enough that ConvertAsync would usually take less than 1 frame to complete, so you could start the async conversion at the beginning of a frame and it would be done by the end.

Today, the camera images that both ARCore and ARKit provide are much larger, and the single-threaded conversion can take much longer than a frame. If you need to process every image, that's not good enough. If the synchronous Convert works for you, then I would just stick with that. If that takes up too much of your frame budget, I think Convert might actually be thread-safe, so you could try spinning up a C# thread and call it from there, e.g., Task.Run. Unity would need to confirm that though.

If you really want to get your hands dirty, you can write your own conversion routine using Unity's Job System and [Burst] compile it. That would give you multithreaded async plus the benefits of Burst, which could be a win. Here is the documentation which describes how to interpret the raw image data.

Then when running the application for a while, this only increases to around 170ms after 10 minutes. This is very odd behaviour.

This is likely due to thermal throttling. Running ARKit for 10 minutes will cause the device to heat up and iOS will start throttling the CPU cores. So I think this is the expected behavior.

Hope that helps!

VanIseghemThomas commented 5 months ago

Dispose only frees memory and other resources, and it should be virtually instantaneous. If I'm reading this thread correctly, you initially thought that it was taking significant time, but you were either timing the wrong thing, or there is some other issue with the profiler. Is that right?

Initially I was in fact under the impression that the Dispose call for the AsyncConversion was the problem (see my initial screenshots). Not sure why the profiler reported that. I would also assume that it shouldn't take that long, so I'm not under the impression that that is in fact the underlying problem here.

As Andy says, ConvertAsync uses a single background thread, while Convert uses all available hardware threads. The timings you reported (Convert=15ms, ConvertAsync=70ms) is a ~4.5x difference, which is about what I would expect on an iPad. At the time I implemented it (cira 2018), the image sizes were small enough that ConvertAsync would usually take less than 1 frame to complete, so you could start the async conversion at the beginning of a frame and it would be done by the end.

Ok, given what @andyb-unity said above this makes sense. But I'm still not sure why the exact same version built on a Windows machine has much better performance. So I'm still under the impression that something is not running like it's supposed to. Any idea why that could be the case?

Today, the camera images that both ARCore and ARKit provide are much larger, and the single-threaded conversion can take much longer than a frame. If you need to process every image, that's not good enough. If the synchronous Convert works for you, then I would just stick with that. If that takes up too much of your frame budget, I think Convert might actually be thread-safe, so you could try spinning up a C# thread and call it from there, e.g., Task.Run. Unity would need to confirm that though.

This was actually the next thing I am in the process of trying 😄. Initially I only did this for the actual processing for the converted image assuming this was the processing intensive part. To my surprise it was not.

If you really want to get your hands dirty, you can write your own conversion routine using Unity's Job System and [Burst] compile it. That would give you multithreaded async plus the benefits of Burst, which could be a win. Here is the documentation which describes how to interpret the raw image data.

Looked into this, but the documentation does not recommend having long running jobs, that's why I went for creating a thread. Do not have experience with jobs, so it just might be possible I've got to do some homework on those.

This is likely due to thermal throttling. Running ARKit for 10 minutes will cause the device to heat up and iOS will start throttling the CPU cores. So I think this is the expected behavior.

This makes a lot of sense. However I was surprised to see such a drastic slowdown.

Hope that helps!

This is very helpful and gives a lot more insight, thanks!

VanIseghemThomas commented 5 months ago

I'm not initially concerned about the async overload taking longer-- that's the purpose of async, to schedule the work onto a background thread instead of blocking the main thread. I do find it concerning that it takes longer over time.

Not sure if I missed this, but it might be a good idea to include this in the documentation.

If you file a new bug with this updated info, I'll see if we can bypass initial QA review and get it assigned to us. Apologies that QA still hasn't responded to your October bug report.

What would be the best way to report this. It essentially comes down to the fact that the AsyncConvert has poor performance when built om MacOS vs Windows machines. The difference between the Convert and AsyncConvert makes sense now, but there is still a very noticeable difference between the platforms. Best way to investigate would just be to build a simple application that fetches and converts the images and benchmark the 2 builds.

andyb-unity commented 5 months ago

What would be the best way to report this

https://unity.com/releases/editor/qa/bug-reporting. Send us the bug number you get from this process. Should be of the form IN-XXXXX.