getnamo / OpenCL-Unreal

OpenCL Plugin for the Unreal Engine
MIT License
28 stars 10 forks source link

add TArray<uint8> <=> Various data types auto conversion #2

Open getnamo opened 6 years ago

getnamo commented 6 years ago

In order to be able to pass customized input/output data to your cl kernels, we need a way to specify these inputs and bind this data in blueprints.

Consider being able to pass in textures, float arrays, int arrays, vector arrays etc. Most likely setup is using TArray as intermediary cast.

getnamo commented 6 years ago

partial work done in https://github.com/getnamo/opencl-ue4/tree/conversion

LiviuChirca commented 5 years ago

these conversion functions would be used to pass to kernel and retrieve back to blueprint ? i'm a little bit confused of how would one read in the kernel an array of vectors converted.

Edit:

Actually i believe i got the point. I pass the array of uchar pointers to the kernel and i have to do the conversion in opencl as well when i modify/read an item. Is it?

getnamo commented 5 years ago

Exactly, inside the kernel I believe you can typecast the byte array pointer to whatever data format you need the work done, do the work, update the e.g. __global char* data pointer passed with some results, then on the blueprint side you'd recast it back to the format needed to consume there.

I think you can even have your kernel side have e.g. __global float* or whatever other format you actually want to work on and pass in the bytes as is and it should work if the data has been correctly converted or aligned.

Keep in mind this is still speculative as I haven't actually tested this feature. This is also still considered an experimental plugin as the main blueprint function will build, enqueue, read and release the kernel all in a single call which isn't efficient if you want to call this e.g. each frame but serves as a proof of concept for now. Contributions welcome :)

LiviuChirca commented 5 years ago

so doing float3 result = *((float3 *)(&buffer)); to a byte array buffer made from a FVector array might work? i'm kind of new to opencl and found your work by trying to see how others handle dynamic parameter type.

getnamo commented 5 years ago

you can probably just do (void *)&buffer on the host side

Ret = clSetKernelArg(Kernel, 0, sizeof(cl_mem), (void *)&buffer);

and on the kernel side just define your function to take the ___global float3* as the arg.

As an aside, just realized the main plugin function RunKernelOnDevices currently doesn't use the args param, it needs an update to use the Args string in https://github.com/getnamo/opencl-ue4/blob/master/Source/OpenCL/Private/OpenCLPlugin.cpp#L168 MemObj (and also use that string length and not 128 fixed).

LiviuChirca commented 5 years ago

yeah, there are many things that should be changed withing your plugin, but it's a good start point.

getnamo commented 5 years ago

Also see http://www.informit.com/articles/article.aspx?p=1732873&seqNum=3 for alignment

LiviuChirca commented 5 years ago

yes, thx buddy.

LiviuChirca commented 5 years ago

Hi,

Just want to confirm to you that this approach is working. Just did the testing with an FVector parameter passed to the kernel, i multiplied it by 2 and returned back and got the expected results.

LiviuChirca commented 5 years ago

example of this implemented. https://www.youtube.com/watch?v=rzwC5tykXHg

getnamo commented 5 years ago

Great, thanks for confirming; that demo is looking pretty snazzy!

Only issue I see with changing it from actor component to generic object is you lose default options and easy event callbacks. For opencl to be useful in runtime all the calls will have to be asynchronous on a background thread with completion callbacks arriving back on the game thread. Then without being able to get the results synchronously in the same line, I find it messier to link events on manually created uobjects than actor components which have + signs, but that's an opinion. That said it might be cleaner to just add it as a latent function e.g. mixing some concepts from https://www.casualdistractiongames.com/single-post/2016/05/15/Creating-Latent-Functions-for-Blueprints-in-C but I've never done this approach myself.

LiviuChirca commented 5 years ago

i'm creating context and command queue on the main cpu thread and hold that information in opencl object, and then i execute the kernel from a separate background thread. This is the plan. Not tested it yet but this is what i'm aiming. So there won't be any lock. I still have to figure some things with clEnqueueNDRangeKernel function for scalability.

getnamo commented 5 years ago

Right but what I'm saying is that the user will need a way to get the results and there are a few approaches for this. In blueprint you can't use a regular function as the results won't be ready by the end of the function call. This means you'll need to use some kind of event, triggered when the results are ready.

My usual basic approach is just to use a component and call some function on it -> c++ side does work on a background thread, then using taskgraph run a function on game thread -> trigger event e.g. OnResult. Pro: It's easy to use in blueprint, Con: Events are annoying to bind blueprint if you don't use an actor component also you can't easily distinguish multiple calls without some IDs. (this is currently the structure of the opencl plugin https://github.com/getnamo/opencl-ue4/#how-to-use, but it doesn't use the background threading atm)

Second option is the latent function approach. Pro: Cleanest semantics, it's similar a normal function call, Con: It would restrict calling a kernel function to the top level event graph only. Example: VA Rest Apply URL uses this approach (https://github.com/ufna/VaRest/blob/develop/Source/VaRestPlugin/Classes/VaRestRequestJSON.h#L204)

image

Another possible approach is to use reflection to allow the user to get a callback on a custom function they create on their blueprint (example of this used in the socket.io plugin: https://github.com/getnamo/socketio-client-ue4#emit-with-callback). Pro: most flexibility, Con: user needs to name a function also writing reflection to bind correctly to an unknown blueprint function is pretty complicated and messy to do (https://github.com/getnamo/socketio-client-ue4/blob/master/Source/SocketIOClient/Private/SocketIOClientComponent.cpp#L187)

LiviuChirca commented 5 years ago

clFinish does exactly that. https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clFinish.html

Blocks until completion, so even if you setup a callback event, it's still blocked by this. And the function actually returns when clFinish returns either success or fail. And also is a sync point.

Or am i wrong?

Also if you're passing buffer data for read write, you need that data to be available until completion of kernel. For example:

Parameters[i].Object = clCreateBuffer(Device.Context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, Parameters[i].Data.Num(), Parameters[i].Data.GetData(), &RetBuffer);

cl_int RetRead = clEnqueueReadBuffer(Device.CommandQueue, Parameters[i].Object, CL_TRUE, 0, Parameters[i].Data.Num(), Parameters[i].Data.GetData(), 0, NULL, NULL);

If Parameters[i].Data is gone or changed then it's gonna be a problem.

I understand what you're saying and it's very good data to work with. You might be also right about actor component, provides more possibilities than simple UObject. Gonna let you know what i came across during this and provide with fixes to code if needed.

getnamo commented 5 years ago

Yep clFinish should be a blocking call for whichever thread calls it, but I'm still talking about the blueprint side of things which the user of opencl plugin will interact with.

You could in theory block the game thread and then use a regular function, but this is bad design. You should never block the game thread. If you're not blocking the game thread then you will need to use some sort of event system to get information, the above 3 options I mentioned are the different ways I can think of signaling updates in blueprint for asynchronous actions.

Either way, looking forward to your progress 👍

LiviuChirca commented 5 years ago

Yeah, no way to block the game thread, just the background thread the kernel is executed from. And then you have safe data buffer and make sure nothing it's tampering.

I'm thinking of combining this with : https://www.unrealengine.com/marketplace/multi-task

Which is a free plugin I've made that allows multi-threading in blueprints.

getnamo commented 5 years ago

Yep, you can look at something like https://github.com/getnamo/socketio-client-ue4/blob/master/Source/CoreUtility/Private/CoreUtilityBPLibrary.cpp#L92 to see how I've done some threading work back and forth between a background thread and game thread for minimal blocking.

While that function needed a lot of back and forth for ue4 specific module reasons, I think kernel execution will only need e.g.

void RunKernel(const TArray<uint8>& InBytes)
{
    //run this on a background thread
    Async(EAsyncExecution::Thread,[InBytes]
    {
        //execute kernel

        //...

        //clFinish

        //Callback on game thread
        AsyncTask(ENamedThreads::GameThread, [/*whatever data you need to copy to game thread*/]
        {
            //signal that you're done e.g. OnResult
        }
    }
}
LiviuChirca commented 5 years ago

yes, that would be it.