Closed aryansaurav closed 2 years ago
Hi @aryansaurav,
Device
as GPU will try to run on the hardware GPU using the best way considering the device capabilities. On barracuda 2.4.0 if the device does not support compute shader it will then run using the PixelShader backend (Type
as opposed to Device
). Could you specify the graphics API you are using (Vulkan/OpenGL) as well as the version of Barracuda please?
Thanks! Florent
Thanks @FlorentGuinier for your reply. I am using barracuda 2.4.0 and have tried both Vulkan/GLES (3.1/3.2). None of them work on Android if I set device to GPU. The app crashes after 4-5 frames (seen on Profiler). However, if I set Type to ComputePrecompiled using the other definition, it works but quite slowly!
Further, on Windows, everything works (NVDA RTX graphics card) but it is very slow if I set Type to any of the given options. If I just use CreateWorker() without any parameters, it's much much faster. Still do not understand what the device and type do as far as GPU usage is concerned. Is setting Type to ComputePrecompiled really meant to use GPU? It runs too slow!
Hi @aryansaurav
Device
is a simple user friendly way to remap to Type
here is the documentation:
https://docs.unity3d.com/Packages/com.unity.barracuda@2.4/manual/Worker.html
There should be no difference past this remaping!
For example on 2.4.0 on both Android and Windows GPU
and Auto
should be equivalent to ComputePrecompiled
.
internal static WorkerFactory.Type GetBestTypeForDevice(WorkerFactory.Device device)
{
switch (device)
{
case WorkerFactory.Device.Auto:
case WorkerFactory.Device.GPU:
return WorkerFactory.Type.ComputePrecompiled;
default:
return WorkerFactory.Type.CSharpBurst;
}
}
We however have the newer PixelShader backend to run on GPU when compute shader are not available, see https://github.com/Unity-Technologies/barracuda-release/blob/b1eac6c34b12b1ef5506fb7121a29eda2997efd1/Barracuda/Runtime/Core/Backends/BarracudaBackendsFactory.cs#L31
internal static WorkerFactory.Type ValidateType(WorkerFactory.Type type)
{
type = ResolveAutoType(type);
Assert.AreNotEqual(type, WorkerFactory.Type.Auto);
if (WorkerFactory.IsType(type, WorkerFactory.Device.GPU) && !ComputeShaderSingleton.Instance.supported)
{
type = WorkerFactory.Type.PixelShader;
}
return type;
}
I hope it clarify both Device
and Type
? However some of the behavior you have seen seems unexpected indeed, maybe trying to use only Type
to better target the problem would help understand the source?
Florent
Thanks for the details @FlorentGuinier . The code makes sense but in practice, there were some strange behaviors. If you would like to reproduce the issues I mentioned, you can take the package on Github here (from another user): https://github.com/keijiro/NNCam
More specifically, the user initializes worker in the file (line 16) https://github.com/keijiro/NNCam/blob/main/Assets/NNCam/SegmentationFilter.cs
Then, on Windows, I changed this line to initialize worker without any input arguments (like it is), or with Type as input or Device as input.. You will have to modify it somewhat like this: // _worker = ModelLoader.Load(_resources.model).CreateWorker(); _model = ModelLoader.Load(_resources.model); _worker = WorkerFactory.CreateWorker(WorkerFactory.Type.CSharpBurst, _model, true);
Issues on Windows: Now, if no input is provided in CreateWorker, it runs much faster than when Type is set to ComputePrecompiled or Device is GPU. But according to the explanations, these should be the same.
Issues on Android : If no input is provided in CreateWorker, then it simply does not work. If Type is set to Compute, it works but slowly. If device is set to GPU, it crashes. But, according to your explanations, these three should be essentially the same.
Please let me know if you need any additional information in reproducing this bug, or about the hardware specifications.
Hi @aryansaurav,
Thanks for reporting the issue,
I was able to reproduce the slowdown. The issue is, that the Verbose mode slows down the execution. You should set it to false
instead of true
:
_worker = WorkerFactory.CreateWorker(WorkerFactory.Type.ComputePrecompiled, _model, false);
I wasn't been able to reproduce the Android crash on my devices. Which Android device model do you use? Could you also try ComputePrecompiled
instead of Compute
Backend?
Thanks @Aurimasp I actually just figured that out too , the verbose leads to slow down. Thanks a lot anyways to you both.
But, now there remains only issue with the crashing on Android.. And it does not happen when verbose is set to True (though it's very slow). If verbose is set to false then it crashes with the Type set to ComputePrecompiled. Device info: Android 11.0 on Snapdragon 690. (OnePlus Nord N10 5G) I wonder if it has too with the synchronous execution. I am using it in a coroutine but I understood today that it is not multi-threading.
I am pasting my code with coroutine below. Would appreciate any help.
using System.Collections; using System.Collections.Generic; using UnityEngine; using Unity.Barracuda;
namespace NNCam {
public class my_filtered_webcam : MonoBehaviour
{
#region Variable declarations
WebCamTexture _webcam;
bool _bcameranotfound = false;
public ResourceSet _resources;
public RenderTexture _webcamrendertexture;
RenderTexture _processed_texture;
bool _bimageupdated = false;
IWorker _worker;
Model _model;
const int Width = 640 + 1;
const int Height = 352 + 1;
#endregion
// Start is called before the first frame update
void Start()
{
WebCamDevice[] devices = WebCamTexture.devices;
if (_webcam != null && _webcam.isPlaying)
{
_webcam.Stop();
}
if (devices != null && devices.Length != 0)
{
if(devices.Length>1)
_webcam = new WebCamTexture(devices[1].name);
else
_webcam = new WebCamTexture(devices[0].name);
_webcam.Play();
_webcamrendertexture = new RenderTexture(_webcam.width, _webcam.height, 1);
Graphics.Blit(_webcam, _webcamrendertexture);
Debug.Log("Camera height, width: " + _webcam.height + ", " + _webcam.width);
}
else
{
Debug.Log("No camera found!");
_bcameranotfound = true;
}
if (_resources != null)
{
_model = ModelLoader.Load(_resources.model);
_worker = WorkerFactory.CreateWorker(WorkerFactory.Type.ComputePrecompiled, _model, false);
if (_worker != null)
{
Debug.Log("NN model loaded successfully");
StartCoroutine(ProcessImage());
}
else
Debug.Log("Issue loading NN model in my_filtered_webcam script!");
}
}
// Update is called once per frame
void Update()
{
if (!_webcam.didUpdateThisFrame)
{
//Debug.Log("Camera frame not updated!");
return;
}
else
{
Graphics.Blit(_webcam, _webcamrendertexture);
}
}
private void OnDestroy()
{
Destroy(_webcam);
Destroy(_webcamrendertexture);
Destroy(_processed_texture);
_worker?.Dispose();
}
IEnumerator ProcessImage()
{
// Preprocessing for BodyPix
while (true)
{
using (var _imagetensor = new Tensor(_webcamrendertexture, 3))
{
_worker.Execute(_imagetensor);
}
var output = _worker.PeekOutput("float_segments");
yield return new WaitForCompletion(output);
_processed_texture = output.ToRenderTexture(0, 0, 1.0f / 32, 0.5f);
_bimageupdated = true;
}
}
void OnRenderImage(RenderTexture source, RenderTexture destination)
{
if (true)
{
Graphics.Blit(_processed_texture, destination);
_bimageupdated = false;
}
}
}
} // Namespace NNCam
I was not able to reproduce the crash on older Android devices with the provided script. We don't have a Snapdragon 690 device at the moment, but we will order one. Would it be possible to check the crash logs for the time being? You can retrieve logs with the 'adb logcat' tool.
Actually just fixed it.. it doesn't crash anymore using the asynchronous calling. Thanks a lot @Aurimasp and @FlorentGuinier for your help!
Hello,
The CreateWorker() is overloaded with several definitions. But, some of them require Device (GPU/CPU) as input while others require Type as in input (ComputePrecompile/Burst/ etc), some of which are supposed to work on GPU.
I tried both definitions with the aim of using GPU for doing computations on image from the camera. But, the one with Device set to GPU did not work on Android (crashes every time) although works fine on Windows. The definition with Type specified as ComputePrecompiled worked on Android, following the syntax given in the barracuda tutorials.
My question is what's the difference between the two? If the definition with type specified as GPU is the right way to use barracuda on GPU, is there any specific use of the CreateWorker() definition with Device specified GPU?
If particular application is relevant, I am trying to feed camera live video into neural networks using barracuda.
Thanks!