Closed luithefirst closed 3 years ago
Hey, I thought a lot about that, but I thought it might confusing for users when they're confronted with a CountingHashSet, so i took the easy way out...
We could certainly come up with a more efficient way of maintaining the HashSet representation (this might be doable with some internal hackery) I'll take a look at that.
Also the ASetReader
uses content : aval<HashSet<'T>>
of the set, so this affects a lot of use-cases.
How about changing the CountingHashSet
so that it publically represents a set and only exposes the count and counting functionality internally?
That might be a little tricky but i'll think about something along these lines. My main problem here is that having another kind of hashset might be irritating and make the API a bit tedious.
I noticed that the ASetReader
is part of the reference implementation. So like you pointed out, the issue only affects accessing the aval
content where the CountingHashSet is mapped to a HashSet. I guess in most use-cases the final output is consumed by a reader which avoids this.
I also came to the conclusion that it is better to separate the state (HashSet
) and the reference counting (CountingHashSet
), but I'm not sure how this could be implemented best. Using a tuple as a history state appears to be awkward. Also integrating the HashSet directly into the CountingHashSet next to the HashMap does not seem to be very elegant. It would also mean that there is additional overhead because the HashSet state is not needed every time. Alternatively, we could also use a special Reader to perform the mapping, which also seems to be the least intrusive change.
My current solution is in https://github.com/fsprojects/FSharp.Data.Adaptive/commit/a3a09ea7f4ce97df8710fcfcf5c19a29059d0475
Basically both variants are maintained in parallel now
A naive implementation for me would also have a Set for the state and a Map for the reference counting.
AdaptiveHashSetImpl still uses: let content = history |> AVal.map CountingHashSet.toHashSet
If you run the CollectionUpdate
benchmark, the CSet_Map_GetValue
performance is still bad, so we should also change it there. Unfortunately, it looks like this would be a bigger change as all the Readers work directly on the history? For me, this suggests including the Set in the history state, but when I reviewed this, it appeared awkward.
Hey, I'm aware that this is suboptimal, however I think there is no real fix for it (except exposing the CountingHashSet
)
Tbh. I think ASet.toAVal
and ASet.force
are not really the most important operations, since aval<HashSet<'T>>
is a different kind of datastructure.
So after all, I think we will just need to accept that for the sake of simplicity/uniformity.
I've found 15+21 references in my application. When I implemented the system, I assumed an incremental computation complexity behind the updates. At the moment, I cannot tell how performance-critical this is, but we definitely should not ignore this and at least keep the issue around.
Earlier we came to the conclusion that exposing the CountingHashSet
would not be elegant. When I look at this from the point of the ChangeableIndexList
, there we also have a special collection IndexList
and not a list or list clone. However, it allows operations based on Index
there, so this might not be a fair comparison. Still, it is an option.
How do you see changing the History state to hold the HashSet and the CountingHashSet? This might still be a feasible option.
For ASet.toAVal
it might be possible to use a Reader in between instead of Content
?
How do you see changing the History state to hold the HashSet and the CountingHashSet?
That's more or less impossible to do, since union
, difference
, etc. need ref-counting internally
For ASet.toAVal it might be possible to use a Reader in between instead of Content?
You can do so right now by using something like AVal.custom (fun t -> reader.GetOperations(t) |> ignore; reader.State)
Which precisely gives you the CountingHashSet, but I don't think we should expose that directly via ASet.force
or ASet.toAVal
which (in my opinion) should be as sound as possible (even when this means accepting performance implictations)
but we definitely should not ignore this and at least keep the issue around.
I'm not a huge fan of this since this way I always have to browse through many many issues and after some time no-one remembers what things were about
I experimented a little (by using a tuple for ASet-State CountingHashSet<'T> * HashSet<'T>
) and your benchmark looks like:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT DEBUG
Job-LAAHOK : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
InvocationCount=1 UnrollFactor=1
| Method | Count | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------- |------- |---------:|---------:|---------:|------:|------:|------:|----------:|
| CSet_Map_GetValue | 1 | 334.0 us | 6.53 us | 11.77 us | - | - | - | 242.95 KB |
| CList_Map_GetValue | 1 | 775.2 us | 14.91 us | 15.31 us | - | - | - | 302.27 KB |
| CSet_Map_GetValue | 1000 | 347.0 us | 5.64 us | 13.30 us | - | - | - | 604.04 KB |
| CList_Map_GetValue | 1000 | 725.9 us | 10.39 us | 19.77 us | - | - | - | 596.88 KB |
| CSet_Map_GetValue | 10000 | 402.9 us | 7.78 us | 12.12 us | - | - | - | 730.15 KB |
| CList_Map_GetValue | 10000 | 806.7 us | 15.86 us | 17.63 us | - | - | - | 710.64 KB |
| CSet_Map_GetValue | 100000 | 403.6 us | 4.28 us | 3.57 us | - | - | - | 800.04 KB |
| CList_Map_GetValue | 100000 | 918.0 us | 15.41 us | 16.49 us | - | - | - | 882.94 KB |
While this certainly improves this use-case it introduces overhead in all aset-combinators for maintaining the HashSet
variant...
It's good to know that it would be possible to fix the performance with custom readers if needed.
I also had concerns about the overhead with such an implementation. The benchmark results show reasonable performance. Ideally, we should also measure the overhead in other scenarios where the Set state is not needed in the end and compare it to the old implementation. If this shows unfavorable characteristics we should revert it, but also should not give up that easily as it would leave a performance pitfall.
fixed in v110
A
CSet
/ASet
internally use aCountingHashSet
, but exposeHashSet
externally. The HashSet representation is built from scratch after every change when the state queried.clist
andcset
in an identical use-case. The benchmark performs and add/append to the collection with a certain size (Count) and then queries the state that is build by a map:One possibility to avoid that would be to directly expose the
CountingHashSet
when accessing thecset
/aset
state. This would also fix #79. I've not yet evaluate the implications of that and also not thought about other alternatives, further discussion would be needed.