# Akade.IndexedSet
A convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching. In a nutshell, it allows you to write LINQ-like queries without enumerating through the entire list. If you are currently completely enumerating through your data, expect huge speedups and much better scalability!
A sample showing different queries as you might want do for a report:
// typically, you would query this from the db
var data = new Purchase[] {
new(Id: 1, ProductId: 1, Amount: 1, UnitPrice: 5),
new(Id: 2, ProductId: 1, Amount: 2, UnitPrice: 5),
new(Id: 6, ProductId: 4, Amount: 3, UnitPrice: 12),
new(Id: 7, ProductId: 4, Amount: 8, UnitPrice: 10) // discounted price
};
IndexedSet<int, Purchase> set = data.ToIndexedSet(x => x.Id)
.WithIndex(x => x.ProductId)
.WithRangeIndex(x => x.Amount)
.WithRangeIndex(x => x.UnitPrice)
.WithRangeIndex(x => x.Amount * x.UnitPrice)
.WithIndex(x => (x.ProductId, x.UnitPrice))
.Build();
// efficient queries on configured indices
// in contrast to standard LINQ, they do not enumerate the entire list!
_ = set.Where(x => x.ProductId, 4);
_ = set.Range(x => x.Amount, 1, 3, inclusiveStart: true, inclusiveEnd: true);
_ = set.GreaterThanOrEqual(x => x.UnitPrice, 10);
_ = set.MaxBy(x => x.Amount * x.UnitPrice);
_ = set.Where(x => (x.ProductId, x.UnitPrice), (4, 10));
Below, you find runtime complexities. Benchmarks can be found here
Query | Unique-Index | NonUnique-Index | Range-Index |
---|---|---|---|
Single | ⚠ O(1) | ⚠ O(1) | ⚠ O(log n) |
Where | ✔ O(1) | ✔ O(m) | ✔ O(log n + m) |
Range | ❌ | ❌ | ✔ O(log n + m) |
< / <= | ❌ | ❌ | ✔ O(log n + m) |
> / >= | ❌ | ❌ | ✔ O(log n + m) |
OrderBy | ❌ | ❌ | ✔ O(m) |
Max/Min | ❌ | ❌ | ✔ O(1) |
Query | Prefix-Index | FullText-Index |
---|---|---|
StartWith | ✔ O(w+r) | ✔ O(w+r) |
Contains | ❌ | ✔ O(w+r) |
Fuzzy StartWith | ✔ O(w+D+r) | ✔ O(w+D+r) |
Fuzzy Contains | ❌ | ✔ O(w+D+r) |
ℹ FullText indices use a lot more memory than prefix indices and are more expensive to construct. Only use FullText indices if you really require it.
Dictionary-based, O(1), access on keys:
IndexedSet<int, Data> set = IndexedSetBuilder<Data>.Create(a => a.PrimaryKey)
.WithUniqueIndex(x => x.SecondaryKey)
.Build();
_ = set.Add(new(PrimaryKey: 1, SecondaryKey: 5));
// fast access via primary key
Data data = set[1];
// fast access via secondary key
data = set.Single(x => x.SecondaryKey, 5);
ℹ Entities do not require a primary key.
IndexedSet<TPrimaryKey, TData>
inherits fromIndexedSet<TData>
but provides convenient access to the automatically added unique index:set[primaryKey]
instead ofset.Single(x => x.PrimaryKey, primaryKey)
.
Dictionary-based, O(1), access on keys (single value) with multiple values (multiple keys):
IndexedSet<int, Data> set = new Data[] { new(PrimaryKey: 1, SecondaryKey: 5), new(PrimaryKey: 2, SecondaryKey: 5) }
.ToIndexedSet(x => x.PrimaryKey)
.WithIndex(x => x.SecondaryKey)
.Build();
// fast access via secondary key
IEnumerable<Data> data = set.Where(x => x.SecondaryKey, 5);
Binary-heap based O(log(n)) access for range based, smaller than (or equals) or bigger than (or equals) and orderby queries. Also useful to do paging sorted on exactly one index.
IndexedSet<Data> set = IndexedSetBuilder.Create(new Data[] { new(1, SecondaryKey: 3), new(2, SecondaryKey: 4) })
.WithRangeIndex(x => x.SecondaryKey)
.Build();
// fast access via range query
IEnumerable<Data> data = set.Range(x => x.SecondaryKey, 1, 5);
// fast max & min key value or elements
int maxKey = set.Max(x => x.SecondaryKey);
data = set.MaxBy(x => x.SecondaryKey);
// fast larger or smaller than
data = set.LessThan(x => x.SecondaryKey, 4);
// fast ordering & paging
data = set.OrderBy(x => x.SecondaryKey, skip: 10).Take(10); // second page of 10 elements
Prefix- & Suffix-Trie based indices for efficient StartWith & String-Contains queries including support for fuzzy matching.
IndexedSet<Type> data = typeof(object).Assembly.GetTypes()
.ToIndexedSet()
.WithPrefixIndex(x => x.Name)
.WithFullTextIndex(x => x.FullName)
.Build();
// fast prefix or contains queries via indices
_ = data.StartsWith(x => x.Name, "Int");
_ = data.Contains(x => x.FullName, "Int");
// fuzzy searching is supported by prefix and full text indices
// the following will also match "String"
_ = data.FuzzyStartsWith(x => x.Name, "Strang", 1);
_ = data.FuzzyContains(x => x.FullName, "Strang", 1);
There are overloads for all indices that allow to use multiple keys.
You can have a unique index where each element can have multiple keys:
IndexedSet<int, Data> set = IndexedSetBuilder<Data>.Create(a => a.PrimaryKey)
.WithUniqueIndex(x => x.AlternativeKeys) // Where AlternativeKeys returns an IEnumerable<int>
.Build();
_ = set.Add(new(PrimaryKey: 1, AlternativeKeys: new[] { 3, 4 }));
set.Single(x => x.AlternativeKeys, 3); // returns above element
The same applies for all other index types, for example for non-unique indices:
IndexedSet<int, GraphNode> set = IndexedSetBuilder<GraphNode>.Create(a => a.Id)
.WithIndex(x => x.ConnectsTo) // Where ConnectsTo returns an IEnumerable<int>
.Build();
// 1 2
// |\ /
// | 3
// \|
// 4
_ = set.Add(new(Id: 1, ConnectsTo: new[] { 3, 4 }));
_ = set.Add(new(Id: 2, ConnectsTo: new[] { 3 }));
_ = set.Add(new(Id: 3, ConnectsTo: new[] { 1, 2, 3 }));
_ = set.Add(new(Id: 4, ConnectsTo: new[] { 1, 3 }));
// For readability, it is recommended to write the name for the parameter contains
IEnumerable<GraphNode> nodesThatConnectTo1 = set.Where(x => x.ConnectsTo, contains: 1); // returns nodes 3 & 4
IEnumerable<GraphNode> nodesThatConnectTo3 = set.Where(x => x.ConnectsTo, contains: 1); // returns nodes 1 & 2 & 3
// Non-optimized Where(x => x.Contains(...)) query:
nodesThatConnectTo1 = set.FullScan().Where(x => x.ConnectsTo.Contains(1)); // returns nodes 3 & 4, but enumerates through the entire set
:information_source: For range queries, this introduces a small overhead as the results are filtered to be distinct: i.e.
O(log n + m log m)
instead ofO(log n + m)
.:information_source: Multi-key string indices are marked experimental. Read more at Experimental Features
The data structure also allows to use computed or compound keys:
var data = new RangeData[] { new(Start: 2, End: 10) };
IndexedSet<RangeData> set = data.ToIndexedSet()
.WithIndex(x => (x.Start, x.End))
.WithIndex(x => x.End - x.Start)
.WithIndex(ComputedKey.SomeStaticMethod)
.Build();
// fast access via indices
IEnumerable<RangeData> result = set.Where(x => (x.Start, x.End), (2, 10));
result = set.Where(x => x.End - x.Start, 8);
result = set.Where(ComputedKey.SomeStaticMethod, 42);
ℹ For more samples, take a look at the unit tests.
The "normal" indexedset is not thread-safe, however, a ReaderWriterLock-based implementation is available.
Just call BuildConcurrent()
instead of Build()
:
ConcurrentIndexedSet<RangeData> set = data.ToIndexedSet()
.WithIndex(x => (x.Start, x.End))
.BuildConcurrent();
⚠ The concurrent implementation needs to materialize all query results.
OrderBy
andOrderByDescending
take an additionalcount
parameter to avoid unnecessary materialization. You can judge the overhead hereNo reflection and no expressions - convention-based index naming
We are using the CallerArgumentExpression-Feature of .Net 6/C# 10 to provide convention-based naming of the indices:
set.Where(x => (x.Prop1, x.Prop2), (1, 2))
tries to use an index named "x => (x.Prop1, x.Prop2)"
set.Where(ComputedKeys.NumberOfDays, 5)
tries to use an index named "ComputedKeys.NumberOfDays"
:information_source: The following naming conventions are recommended:
- Use x as parameter name in any lambdas that determines an index name.
- Do not use parentheses in any lambda that determines an index name.
- Do not use block bodied in any lambda that determines an index name.
- For complex indices, use a static method. C# Analyzers are shipped with the package to spot incorrect index names.
Reasons
Use "named" indices by using static methods:
record Data(int PrimaryKey, int SecondaryKey);
IndexedSet<int, Data> set = IndexedSetBuilder<Data>.Create(x => x.PrimaryKey)
.WithUniqueIndex(DataIndices.UniqueIndex)
.WithRangeIndex(x => x.SecondaryKey)
.Build();
_ = set.Add(new(1, 4));
// querying unique index:
Data data = set.Single(DataIndices.UniqueIndex, 4); // Uses the unique index
Data data2 = set.Single(x => x.SecondaryKey, 4); // Uses the range index
IEnumerable<Data> inRange = set.Range(x => x.SecondaryKey, 1, 10); // Uses the range index
ℹ We recommend using the lambda syntax for "simple" properties and static methods for more complicated ones. It's easy to read, resembles "normal" LINQ-Queries and all the magic strings are compiler generated.
The implementation requires any keys of any type to never change the value while the instance is within the set. You can manually remove, update and add an object. However, there are some helper methods for that - which is especially useful for the concurrent variant as it provides thread-safe serialized access.
// updating a mutable property
_ = set.Update(dataElement, e => e.MutableProperty = 7);
// updating an immutable property
_ = set.Update(dataElement, e => e with { SecondaryKey = 12 });
// be careful: the dataElement still refers to the "old" record after the update method
_ = set.Update(dataElement, e => e with { SecondaryKey = 12 });
// updating in an concurrent set
concurrentSet.Update(set =>
{
// serialized access to the inner IndexedSet, where you can safely use above update methods
// in an multi-threaded environment
});
Remember that you can index whatever you want, including computed properties. This also applies for fuzzy matching:
IndexedSet<Data> set = IndexedSetBuilder<Data>.Create(x => x.PrimaryKey)
.WithFullTextIndex(x => x.Text.ToLowerInvariant())
.Build();
IEnumerable<Data> matches = set.FuzzyContains(x => x.Text.ToLowerInvariant(), "Search", maxDistance: 2);
Potential features (not ordered):
.ToIndexedSet().WithIndex(x => ...).[...].Build()
AsMemory()
...If you have any suggestion or found a bug / unexpected behavior, open an issue! I will also review PRs and integrate them if they fit the project.