andywiecko / BurstTriangulator

2d Delaunay triangulation with mesh refinement for Unity with Burst compiler
https://andywiecko.github.io/BurstTriangulator/
MIT License
213 stars 17 forks source link

Setting up and triangulating from a running job. #135

Closed LennartJohansen closed 1 month ago

LennartJohansen commented 3 months ago

Hi.

First I would like to thank you for the work I have done on the burst triangulator.
Good to have a way to benefit from the burst compiler and jobs when triangulating polygons in Unity.

One thing that would be really useful is a workflow where you can set up and run a triangulation of a polygon inside a job.

As an example I have a process where I generate building meshes from polygon outlines. To keep the overhead of scheduling jobs down each building is processed on a thread in a IJobParallelFor job. A single job could have 1000s of buildings.

Setting up triangulator objects on the main thread and combining dependencies to use multiple CPU cores work, but will in many cases require completing earlier jobs and it takes time to schedule 1000s of triangulation jobs.

I modified an earlier version of your project to support this. I see you have done quite a bit of refactoring since then but the same principle should work.

If you change Triangulator, TriangulationSettings, RefinementThresholds and InputData custom NativeCollections it will be possible to create, run and dispose the triangulator from any job using .Temp memory.

The normal main thread use would only need some small changes managing the allocation/dispose of settings and input You can schedule or run from the main thread and run from any job.

andywiecko commented 3 months ago

Hi @LennartJohansen Many thanks for the contribution! I plan to add NativeTriangulator struct with low level API, which could be allocated/run inside job. You can follow this issue in the project board https://github.com/users/andywiecko/projects/1?pane=issue&itemId=61431743

In the next week I'm going to push development further. First, I plan to fix issue with float/double precision by introducing Triangulator<T>, then I will focus on NativeTriangulator.

I want to preserve current API of Triangulator due to the fact that for many users low level usage is not required.

Contributions are always welcome! 🙂

Best, Andrzej

LennartJohansen commented 3 months ago

@andywiecko

A NativeTriangulator would be great.

Did you profile performance of float vs double in the triangulation? Would be interesting to know. I usually expect a bigger gain than it turns out to be. Unless the compiler manages to vectorize the code most of the float and double math takes the same time on the CPU.

Have a couple of other thoughts.

A NativeSlice input would be useful, when processing a lot of data you might not have separate NativeArrays for all the point data.

Another thing I have been doing for performance when processing a lot of data is pooling temporary lists needed in the jobs.
.Temp memory is faster but allocating memory and copying data when list expand takes up quite a bit of the processing time.

You create a NativeCollection that holds an array with a size of JobsUtility.MaxJobThreadCount and using [NativeSetThreadIndex] internal readonly int m_ThreadIndex;

The NativeSetThreadIndex tag will make Unity set the treadIndex for every copy of the native pool struct for an instance of the job struct.

With in a IJobParallelFor job each thread can then request a pooled object with pre allocated lists/capacity. The pool uses the m_ThreadIndex to get the right pool instance.

The performance benefits should be significant when processing a lot of data in parallel. Probably more in % on less complex polygons.

For the BurstTriangulator the challenge is probably to find an elegant way to implement it. When doing a single triangulation from the main thread you would not benefit from this.

One way could be to expose a "NativeTriangulatorPool" on the NativeTriangulator If that is assigned (its .IsCreated is valid) the internal jobs gets its temp list from the pool. If not they create and dispose them as normal.

When batching data you could then create a NativeTriangulatorPool(Allocator.TempJob) and pass to the parallel job doing triangulation. You assign this to the NativeTriangulator in an overloaded constructor or as an exposed field before triangulating.

andywiecko commented 3 months ago

Hi @LennartJohansen, Thanks for the interesting ideas and comments!

Did you profile performance of float vs double in the triangulation? Would be interesting to know. I usually expect a bigger gain than it turns out to be. Unless the compiler manages to vectorize the code most of the float and double math takes the same time on the CPU.

I made a benchmark about two months ago; however, there were some other changes in the project, so this might be different in the final result. float2 seems to be slightly faster.

image

I will publish the result with the next release with Triangulator<T>.

A NativeSlice input would be useful, when processing a lot of data you might not have separate NativeArrays for all the point data.

Are there any benefits regarding the use of NativeSlice? I thought that a subarray view was sufficient: https://docs.unity3d.com/ScriptReference/Unity.Collections.NativeArray_1.GetSubArray.html

You create a NativeCollection that holds an array with a size of JobsUtility.MaxJobThreadCount and using [NativeSetThreadIndex] internal readonly int m_ThreadIndex;

The NativeSetThreadIndex tag will make Unity set the treadIndex for every copy of the native pool struct for an instance of the job struct.

With in a IJobParallelFor job each thread can then request a pooled object with pre allocated lists/capacity. The pool uses the m_ThreadIndex to get the right pool instance.

The performance benefits should be significant when processing a lot of data in parallel. Probably more in % on less complex polygons.

For the BurstTriangulator the challenge is probably to find an elegant way to implement it. When doing a single triangulation from the main thread you would not benefit from this.

One way could be to expose a "NativeTriangulatorPool" on the NativeTriangulator If that is assigned (its .IsCreated is valid) the internal jobs gets its temp list from the pool. If not they create and dispose them as normal.

When batching data you could then create a NativeTriangulatorPool(Allocator.TempJob) and pass to the parallel job doing triangulation. You assign this to the NativeTriangulator in an overloaded constructor or as an exposed field before triangulating.

Regarding parallelism, I think that it is out of the scope of this project. Currently, Triangulator schedules a single job which implements an ordinary IJob. I think that the user should be responsible for managing parallelism. It will be easy to do and customize once NativeTriangulator is completed, but I'm open for suggestion and feature requests.

Best, Andrzej

andywiecko commented 1 month ago

Hi @LennartJohansen, The new release, v3.1, is already published and contains a feature which you requested. You can find the full release notes here. It is already available on OpenUPM.

You can learn more about setting up a triangulation from the job here.

If you have any additional questions or suggestions, let me know.

Best, Andrzej