CitiesSkylinesMods / TMPE

Cities: Skylines Traffic Manager: President Edition
https://steamcommunity.com/sharedfiles/filedetails/?id=1637663252
MIT License
574 stars 85 forks source link

`TaskList<T>` #1552

Closed originalfoo closed 2 years ago

originalfoo commented 2 years ago

TaskList<T> is specifically designed for working with persistent overlay tasks which occasionally encounter bulk additions/deletions/updates due to camera move or tool state change.

For example, network label overlay (text over segments, nodes, etc.):

  • Camera cache would generate list of segment/node ids in-view, resulting in render tasks being created for each of those.
  • While camera is still, those tasks just keep getting rendered from frame to frame with very little overhead (cached bounds boxes, positions, etc).
  • Then the camera moves - some of the old items are still in view, but lots removed and lots of new ones...
  • Deltas (not present in the code linked above) from the camera cache list which items are retained from previous camera position...
    • Retained items will need repositioning on-screen
    • All other tasks will need removing - stuff no longer in cam view
    • Subsequently, a bunch of new tasks are added for the new stuff that just moved in to cam view

It's similar to a FastList, but with without this spam:

FastList<T>.RemoveAt(int) ```csharp public void RemoveAt(int index) { if (m_buffer != null && index < m_size) { m_size--; for (int i = index; i < m_size; i++) { m_buffer[i] = m_buffer[i + 1]; } m_buffer[m_size] = default(T); } } ```

Instead, it moves the last used item to the index of deleted item as the order of items doesn't really matter. It also has customisable blocksize to reduce number of array extensions.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
TaskList_SimulatedCamMoves 227.6 μs 0.45 μs 0.02 μs - - - -
TaskList_Optimised_SimulatedCamMoves 188.6 μs 6.61 μs 0.36 μs - - - -
FastList_SimulatedCamMoves 1,526.6 μs 36.16 μs 1.98 μs - - - -
originalfoo commented 2 years ago

I've attempted to add perf tests but they throw error when run. Running same code (effectively) on dotnetfiddle runs fine.

krzychu124 commented 2 years ago

I've attempted to add perf tests but they throw error when run. Running same code (effectively) on dotnetfiddle runs fine.

There is an attribute to setup things before running single test and another one for cleanup

krzychu124 commented 2 years ago

Test result without manual test(broken) - ton of garbage, but maybe it's just first stage or my PC is slow (it's slow 😄 )? :/ image

Anyways, for proper testing or performance comparison I suggest adding tiny class could do similar things but with the RemoveAt() approach. Honestly it depends on where the object is (it starts moving from index and goes to the end).

There is one important question: do we need retain index on remove or add?? If we don't care about index retainability (assuming we always use Remove - it's searching list to find item) you can rewrite RemoveAt() to move last item at the index of removing item - size will be still correct and that change will simply get rid of whole position shift. That "solution" will make sense only if we don't use index anywhere to access items and when removing we use Remove(T item) that is traversing the internal array to find it.

originalfoo commented 2 years ago

ton of garbage,

I suspect that's because I tried putting the initialise/deltas stuff in to the benchmark methods - those were supposed to be outside but I wasn't sure how the benchmark environment works. Additionally I don't need the randomisation stuff (leftovers from earlier approach).

There is one important question: do we need retain index on remove or add?? If we don't care about index retainability (assuming we always use Remove - it's searching list to find item) you can rewrite RemoveAt() to move last item at the index of removing item - size will be still correct and that change will simply get rid of whole position shift. That "solution" will make sense only if we don't use index anywhere to access items and when removing we use Remove(T item) that is traversing the internal array to find it.

Excellent point! That will allow the entire gaps array thing to be ditched making the whole thing much simpler.

originalfoo commented 2 years ago

Replacing couple of fastlists with normal list (in the init deltas method) got the tests working - also moved the init stuff back in to the [GlobalSetup] method and that ditched all the garbage.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
  [Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT DEBUG

Job=ShortRun  Runtime=Mono x64  Toolchain=InProcessEmitToolchain  
IterationCount=3  LaunchCount=1  WarmupCount=3  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
SparseTaskList_SimulatedCamMoves 1.618 ms 0.0081 ms 0.0004 ms - - - -
SparseTaskList_Manual_SimulatedCamMoves 1.282 ms 0.0066 ms 0.0004 ms - - - -
FastList_SimulatedCamMoves 1.638 ms 0.0032 ms 0.0002 ms - - - -

Next step is ditch the gaps array and just move last task item in to deleted slot and test again.

krzychu124 commented 2 years ago

X86 LegacyJIT DEBUG

Run tests in Release, Release TEST or Benchmark(not sure if that one still works) profile 😉

originalfoo commented 2 years ago

In RELEASE (still with the gaps stuff):

EDIT: pasted wrong file lol

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
  [Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT

Job=ShortRun  Runtime=Mono x64  Toolchain=InProcessEmitToolchain  
IterationCount=3  LaunchCount=1  WarmupCount=3  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
SparseTaskList_SimulatedCamMoves 471.0 μs 18.20 μs 1.00 μs - - - -
SparseTaskList_Manual_SimulatedCamMoves 413.4 μs 96.99 μs 5.32 μs - - - -
FastList_SimulatedCamMoves 1,522.0 μs 13.60 μs 0.75 μs - - - -
originalfoo commented 2 years ago

^ updated comment above

originalfoo commented 2 years ago

Remove gaps array, ~twice as fast :)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
  [Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT

Job=ShortRun  Runtime=Mono x64  Toolchain=InProcessEmitToolchain  
IterationCount=3  LaunchCount=1  WarmupCount=3  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
TaskList_SimulatedCamMoves 227.6 μs 0.45 μs 0.02 μs - - - -
TaskList_Optimised_SimulatedCamMoves 188.6 μs 6.61 μs 0.36 μs - - - -
FastList_SimulatedCamMoves 1,526.6 μs 36.16 μs 1.98 μs - - - -
krzychu124 commented 2 years ago

@aubergine10 quite weird, on my quite old laptop it runs faster😂 - notice what happened when I implemented the same RemoveAt as extension method 😉

image

Question: why tests performs ~70-100x slower if I comment out line that removes the item (RemoveAt) and why >1000x faster when comment out the same line and also loop for adding??

Edit: commented out list.RemoveAt() of that test method image

originalfoo commented 2 years ago

commented out list.RemoveAt() of that test method

That will just keep growing Tasks array every time new items added will it not?

krzychu124 commented 2 years ago

commented out list.RemoveAt() of that test method

That will just keep growing Tasks array every time new items added will it not?

How fast it grows? Pow2?

originalfoo commented 2 years ago

TaskList : By BLOCK_SIZE each time, which can be customised via ctor.

FastList grows by pow2 when small but is limited to 32 elements per extension after that IIRC.

kvakvs commented 2 years ago

Lifehack: You don't need to reallocate too much, if your list is good size from the beginning. We're not running on a 640kb DOS machine anymore, allocating a couple megabytes of a cache array once might be faster than reallocating 64 elements multiple times.

originalfoo commented 2 years ago

Yup, I imagine 512 will be good for things like network labels, and maybe 128 or 256 for speed limits, etc. Ideal value will depend on the type of overlay.

originalfoo commented 2 years ago

Thinking it might be better to ditch this and just use a FastRemoveAt extension for FastList instead.

For example, here's what happend when I removed Clear() from the iteration method = list just keeps growing in size...

        [IterationCleanup]
        public void ClearLists() {
            //TestData.testTaskList.Clear();
            //TestData.testFastList.Clear();
        }
Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
TaskList_01_Add6kItems_To_UnpreparedEmptyList 28,072.0 μs 94,767.5 μs 5,194.52 μs 28,719.70 μs 20000.0000 20000.0000 20000.0000 67358896 B
FastList_01_Add6kItems_To_UnpreparedEmptyList 133.4 μs 2,216.9 μs 121.51 μs 71.00 μs - - - -
TaskList_02_Add6kItems_To_PreparedEmptyList 227.7 μs 2,477.1 μs 135.78 μs 183.20 μs - - - 384016 B
FastList_02_Add6kItems_To_PreparedEmptyList 154.0 μs 2,532.8 μs 138.83 μs 75.50 μs - - - -
TaskList_03_ManuallyAdd6kItems_To_PreparedEmptyList 227.5 μs 2,542.7 μs 139.38 μs 183.20 μs - - - 384016 B
FastList_03_ManuallyAdd6kItems_To_PreparedEmptyList 177.7 μs 1,602.2 μs 87.82 μs 141.70 μs - - - 384016 B
TaskList_04_AltManuallyAdd6kItems_To_PreparedEmptyList 151.2 μs 690.8 μs 37.86 μs 154.55 μs - - - 384016 B

It's probably fixable, but given that a single extension to FastList could deliver the primary goal of significantly faster item deletion there doesn't seem to be any tangible benefit (but lots of potential pitfalls) for using custom TaskList.

krzychu124 commented 2 years ago

It's probably fixable, but given that a single extension to FastList could deliver the primary goal of significantly faster item deletion there doesn't seem to be any tangible benefit (but lots of potential pitfalls) for using custom TaskList.

yeah, I think we should focus on delivering solution and then measure how bad actually is in action (we can profile it with Unity so we will see memory and time) and then try to figure out something better if necessary. I think we will need to move on to custom shader (clean up the dust from my experiments) for clickable overlays (3d worlds signs/icons) so we may save a ton of time on rendering. Good thing is now after you've created multiple benchmarks we see that FastList is not that slow and with a little restriction based on our specific use case it can be even faster than custom solution.

krzychu124 commented 2 years ago

General conclusion after multiple tests and benchmark runs is that proposed feature does not provide enough performance boost. Closing for now.