Closed originalfoo closed 2 years ago
I've attempted to add perf tests but they throw error when run. Running same code (effectively) on dotnetfiddle runs fine.
I've attempted to add perf tests but they throw error when run. Running same code (effectively) on dotnetfiddle runs fine.
There is an attribute to setup things before running single test and another one for cleanup
Test result without manual test(broken) - ton of garbage, but maybe it's just first stage or my PC is slow (it's slow 😄 )? :/
Anyways, for proper testing or performance comparison I suggest adding tiny class could do similar things but with the RemoveAt()
approach. Honestly it depends on where the object is (it starts moving from index and goes to the end).
There is one important question: do we need retain index on remove or add??
If we don't care about index retainability (assuming we always use Remove - it's searching list to find item) you can rewrite RemoveAt()
to move last item at the index of removing item - size will be still correct and that change will simply get rid of whole position shift. That "solution" will make sense only if we don't use index anywhere to access items and when removing we use Remove(T item)
that is traversing the internal array to find it.
ton of garbage,
I suspect that's because I tried putting the initialise/deltas stuff in to the benchmark methods - those were supposed to be outside but I wasn't sure how the benchmark environment works. Additionally I don't need the randomisation stuff (leftovers from earlier approach).
There is one important question: do we need retain index on remove or add?? If we don't care about index retainability (assuming we always use Remove - it's searching list to find item) you can rewrite
RemoveAt()
to move last item at the index of removing item - size will be still correct and that change will simply get rid of whole position shift. That "solution" will make sense only if we don't use index anywhere to access items and when removing we useRemove(T item)
that is traversing the internal array to find it.
Excellent point! That will allow the entire gaps
array thing to be ditched making the whole thing much simpler.
Replacing couple of fastlists with normal list (in the init deltas method) got the tests working - also moved the init stuff back in to the [GlobalSetup]
method and that ditched all the garbage.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
[Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT DEBUG
Job=ShortRun Runtime=Mono x64 Toolchain=InProcessEmitToolchain
IterationCount=3 LaunchCount=1 WarmupCount=3
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
SparseTaskList_SimulatedCamMoves | 1.618 ms | 0.0081 ms | 0.0004 ms | - | - | - | - |
SparseTaskList_Manual_SimulatedCamMoves | 1.282 ms | 0.0066 ms | 0.0004 ms | - | - | - | - |
FastList_SimulatedCamMoves | 1.638 ms | 0.0032 ms | 0.0002 ms | - | - | - | - |
Next step is ditch the gaps
array and just move last task item in to deleted slot and test again.
X86 LegacyJIT DEBUG
Run tests in Release, Release TEST or Benchmark(not sure if that one still works) profile 😉
In RELEASE
(still with the gaps stuff):
EDIT: pasted wrong file lol
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
[Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT
Job=ShortRun Runtime=Mono x64 Toolchain=InProcessEmitToolchain
IterationCount=3 LaunchCount=1 WarmupCount=3
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
SparseTaskList_SimulatedCamMoves | 471.0 μs | 18.20 μs | 1.00 μs | - | - | - | - |
SparseTaskList_Manual_SimulatedCamMoves | 413.4 μs | 96.99 μs | 5.32 μs | - | - | - | - |
FastList_SimulatedCamMoves | 1,522.0 μs | 13.60 μs | 0.75 μs | - | - | - | - |
^ updated comment above
Remove gaps array, ~twice as fast :)
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19043
AMD Ryzen 5 2600X, 1 CPU, 12 logical and 6 physical cores
[Host] : .NET Framework 4.8 (4.8.4470.0), X86 LegacyJIT
Job=ShortRun Runtime=Mono x64 Toolchain=InProcessEmitToolchain
IterationCount=3 LaunchCount=1 WarmupCount=3
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
TaskList_SimulatedCamMoves | 227.6 μs | 0.45 μs | 0.02 μs | - | - | - | - |
TaskList_Optimised_SimulatedCamMoves | 188.6 μs | 6.61 μs | 0.36 μs | - | - | - | - |
FastList_SimulatedCamMoves | 1,526.6 μs | 36.16 μs | 1.98 μs | - | - | - | - |
@aubergine10 quite weird, on my quite old laptop it runs faster😂 - notice what happened when I implemented the same RemoveAt
as extension method 😉
Question: why tests performs ~70-100x slower if I comment out line that removes the item (RemoveAt
) and why >1000x faster when comment out the same line and also loop for adding??
Edit: commented out list.RemoveAt()
of that test method
commented out list.RemoveAt() of that test method
That will just keep growing Tasks
array every time new items added will it not?
commented out list.RemoveAt() of that test method
That will just keep growing
Tasks
array every time new items added will it not?
How fast it grows? Pow2?
TaskList
: By BLOCK_SIZE
each time, which can be customised via ctor.
FastList
grows by pow2 when small but is limited to 32 elements per extension after that IIRC.
Lifehack: You don't need to reallocate too much, if your list is good size from the beginning. We're not running on a 640kb DOS machine anymore, allocating a couple megabytes of a cache array once might be faster than reallocating 64 elements multiple times.
Yup, I imagine 512 will be good for things like network labels, and maybe 128 or 256 for speed limits, etc. Ideal value will depend on the type of overlay.
Thinking it might be better to ditch this and just use a FastRemoveAt
extension for FastList
instead.
For example, here's what happend when I removed Clear()
from the iteration method = list just keeps growing in size...
[IterationCleanup]
public void ClearLists() {
//TestData.testTaskList.Clear();
//TestData.testFastList.Clear();
}
Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
TaskList_01_Add6kItems_To_UnpreparedEmptyList | 28,072.0 μs | 94,767.5 μs | 5,194.52 μs | 28,719.70 μs | 20000.0000 | 20000.0000 | 20000.0000 | 67358896 B |
FastList_01_Add6kItems_To_UnpreparedEmptyList | 133.4 μs | 2,216.9 μs | 121.51 μs | 71.00 μs | - | - | - | - |
TaskList_02_Add6kItems_To_PreparedEmptyList | 227.7 μs | 2,477.1 μs | 135.78 μs | 183.20 μs | - | - | - | 384016 B |
FastList_02_Add6kItems_To_PreparedEmptyList | 154.0 μs | 2,532.8 μs | 138.83 μs | 75.50 μs | - | - | - | - |
TaskList_03_ManuallyAdd6kItems_To_PreparedEmptyList | 227.5 μs | 2,542.7 μs | 139.38 μs | 183.20 μs | - | - | - | 384016 B |
FastList_03_ManuallyAdd6kItems_To_PreparedEmptyList | 177.7 μs | 1,602.2 μs | 87.82 μs | 141.70 μs | - | - | - | 384016 B |
TaskList_04_AltManuallyAdd6kItems_To_PreparedEmptyList | 151.2 μs | 690.8 μs | 37.86 μs | 154.55 μs | - | - | - | 384016 B |
It's probably fixable, but given that a single extension to FastList
could deliver the primary goal of significantly faster item deletion there doesn't seem to be any tangible benefit (but lots of potential pitfalls) for using custom TaskList
.
It's probably fixable, but given that a single extension to FastList could deliver the primary goal of significantly faster item deletion there doesn't seem to be any tangible benefit (but lots of potential pitfalls) for using custom TaskList.
yeah, I think we should focus on delivering solution and then measure how bad actually is in action (we can profile it with Unity so we will see memory and time) and then try to figure out something better if necessary. I think we will need to move on to custom shader (clean up the dust from my experiments) for clickable overlays (3d worlds signs/icons) so we may save a ton of time on rendering. Good thing is now after you've created multiple benchmarks we see that FastList is not that slow and with a little restriction based on our specific use case it can be even faster than custom solution.
General conclusion after multiple tests and benchmark runs is that proposed feature does not provide enough performance boost. Closing for now.
TaskList<T>
is specifically designed for working with persistent overlay tasks which occasionally encounter bulk additions/deletions/updates due to camera move or tool state change.It's similar to a
FastList
, but with without this spam:
```csharp public void RemoveAt(int index) { if (m_buffer != null && index < m_size) { m_size--; for (int i = index; i < m_size; i++) { m_buffer[i] = m_buffer[i + 1]; } m_buffer[m_size] = default(T); } } ```FastList<T>.RemoveAt(int)
Instead, it moves the last used item to the index of deleted item as the order of items doesn't really matter. It also has customisable blocksize to reduce number of array extensions.