dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

[API Proposal]: SortedSet inserted or appended indicator #92038

Open rizamarhaban opened 1 year ago

rizamarhaban commented 1 year ago

Background and motivation

When we add items to SortedSet<T> collection, we would like to know whether the item has been inserted due to ordering/sorting or it was just appended at the end. A simple boolean return when adding to the collection will tell whether the item has been appended or inserted.

bool AppendOrInsert(T item);

Return true if it was appended, or else false if it was inserted due to sorting/ordering the collection. The other method also returns the index position of the added/inserted item.

bool AppendOrInsert(T item, out int index) { ... }

With the indicator, we be able to know the added item condition rather than traversing to check the item position. Especially for the concurrent sorted set.

API Proposal

namespace System.Collections.Generic;

public class SortedSet<T> : ISet<T>, ICollection<T>, ICollection, IReadOnlyCollection<T>, IReadOnlySet<T>, ISerializable, IDeserializationCallback
{
...
    public bool AppendOrInsert(T item) { ... }
...
    public bool AppendOrInsert(T item, out int index) { ... }
...
}

API Usage

// Usage
var c = new SortedSet<int>();
var isAppended = c.AddOrInsert(42);

// Getting the values out
if(isAppended)
    Console.WriteLine("Item was appended");
else
    Console.WriteLine("Item was inserted and sorted");

// indicator with index position
var isAppended = c.AddOrInsert(42, out int index);
if(isAppended )
    Console.WriteLine("Item was appended on Index: ", index);
else
    Console.WriteLine("Item was inserted and sorted on Index: ", index");

Alternative Designs

No response

Risks

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation When we add items to `SortedSet` collection, we would like to know whether the item has been inserted due to ordering/sorting or it was just appended at the end. A simple boolean return when adding to the collection will tell whether the item has been appended or inserted. ```csharp bool AppendOrInsert(T item); ``` Return `true` if it was appended, or else `false` if it was inserted due to sorting/ordering the collection. The other method also returns the index position of the added/inserted item. ```csharp bool AppendOrInsert(T item, out int index) { ... } ``` With the indicator, we be able to know the added item condition rather than traversing to check the item position. Especially for the concurrent sorted set. ### API Proposal ```csharp namespace System.Collections.Generic; public class SortedSet : ISet, ICollection, ICollection, IReadOnlyCollection, IReadOnlySet, ISerializable, IDeserializationCallback { ... public bool AppendOrInsert(T item) { ... } ... public bool AppendOrInsert(T item, out int index) { ... } ... } ``` ### API Usage ```csharp // Usage var c = new SortedSet(); var isAppended = c.AddOrInsert(42); // Getting the values out if(isAppended) Console.WriteLine("Item was appended"); else Console.WriteLine("Item was inserted and sorted"); // indicator with index position var isAppended = c.AddOrInsert(42, out int index); if(isAppended ) Console.WriteLine("Item was appended on Index: ", index); else Console.WriteLine("Item was inserted and sorted on Index: ", index"); ``` ### Alternative Designs _No response_ ### Risks _No response_
Author: rizamarhaban
Assignees: -
Labels: `api-suggestion`, `area-System.Collections`, `untriaged`
Milestone: -
krwq commented 1 year ago

Can you explain your scenario more? How would that help you knowing if item was appended or inserted?

ghost commented 1 year ago

This issue has been marked needs-author-action and may be missing some important information.

rizamarhaban commented 1 year ago

@krwq technically I can check whether the item was sorted while being inserted or is just appended at the end. Doing this we may need to traverse to check where is the item being insert located. If not at the end of the list, then it must be inserted somewhere in the list, then we need to check in what index. Therefore, I was thinking instead of traversing to look for the index and to know whether the item was sorted when being insert, it would be helpful, and I believe can be optimized if the SortedSet able to indicate the insertion or appended in some index.

Example, given a sorted set list [1,3,4,7,9,13,17], and I want to add some item, let's say 8. When we add the item to the sorted set list, we know that 8 will automatically sort first and will be inserted at index 4. If, let's say we want to add item 23 as example, then we know it will be appended at the end of the list, thus will get the [list count - 1] as the index number.

I'm expecting this will be built-in, and I believe can be optimized in the API. Especially this requires a IComparable type as well. I hope that answer it.

tarekgh commented 1 year ago

@rizamarhaban what exactly the scenario you need to do that or use such API with? Do you have a real app scenario that needs that? I am asking because this request does not look like a mainstream case, and I expect very few users will need to use such an API. Usually, we expose APIs for mainstream usages, or something is blocking which need the API.

rizamarhaban commented 1 year ago

@tarekgh probably it's not mainstream. I build and use it mostly for research software and instead of using another framework, I determine to use .NET because is super-fast. The app is simple, a distributed parallel discrete event simulation (PDES) which uses tens of thousands of core processors, can be millions core if needed. The sorted set list is a must and must be optimized to expecting millions if not billions of events per second. I'm also expecting this to be worked when using AOT compilation as well later on .NET 8 if possible. I'm advocating .NET will also be able to support for use in a research software, which some of the functions may not mainstream.

danmoseley commented 1 year ago

@rizamarhaban It's great that you're using .NET for that, but SortedSet is meant to be a good fit for regular scenarios. I suspect you may do better with an implementation you can tune for your purposes. (You can even copy the implementation of SortedSet.)

rizamarhaban commented 1 year ago

@danmoseley it would be great if there is some function to help indicate a sorted insertion test internally from the SortedSet. I cannot share the code and it is quite long and complex. Currently we traverse the list checking because of just want to know what the index of the inserted item. For one logical processor is fine, but once it is distributed and parallel with tens of thousands logical processor, it shows the execution is slow. I'm thinking when we add an item, why we can just get the index as well after the sorting without have to traverse though the list.

I hope .NET can help and support for this and other research software. It is part of promoting the .NET as an option for building HPC software as well. I hope this will be a consideration and as the software KPI is the speed and number of events, I really hope there is a solution for this. Technically like other language for research software, really expecting .NET optimized the sorting engine.

krwq commented 12 months ago

Is it possible you need different data structure instead? Also in many long running scenarios you can use just List, sort and binary search and sort to do operations quickly in batches. For online you'd need to build some sort of tree which meets your need yourself. It would help knowing your scenario as it would be much easier to help you find optimal solution. As @danmoseley mentioned we prefer to add APIs only if they have wider usage (especially on things like System.Collections) and this is likely a rather niche usage.

Note that our code is on MIT license, if you need to quickly unblock yourself creating a copy of the class might be easiest solution: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs

rizamarhaban commented 12 months ago

@krwq I guess we should never use queue in a discrete event simulation executive. The queue in DES is just a model and not real queue, similar to the service processing. Kind of digital twins but is just virtual queue. If there is a schedule coming in within an hour, we don't want to wait for real/actual one hour, instead we wanted to have zero waiting time if can. Therefore, the virtual clock timestamp needs to be sorted as fast as we can.

Here is the thing if not familiar with Parallel Discrete Event Simulation optimistic synchronization. When scheduled messages came, we need to process it asap, these messages must be sorted by timestamp on arrival as quickly as possible. However, you know in parallel processing, sometimes there are stragglers events that just finish processing while do loop is running. In this case, we need to roll back the already executed event handler and send messages to other logical processes to cancel the event. There is a theory around this, and it is called Time Warp. Now, when we retrieve such messages, we want to know whether the event was a straggler one or not. We can do this by traversing the SortedSet or just put a state telling the system that the event was straggler events. Done, that's what we did now.

Lastly, I don't want to reinvent the wheel while there is already SortedSet available, all we need, and probably other programmers needed is telling whether the added item is sorted or just appended and what was the index. Because I know this will be faster. That's all. I really hope this will be the consideration and supported. Why I don't do PR on this, because I know the .NET Team has more capable and advance than me. And can optimized this thing up to the machine code level if possible. Therefore, I only rely on the team for now.

Here is my final input for this issue:

  1. Optimized the sorting process on .NET as fast as the programming way can. Especially when using comparable interface, because that's also affect the performance. Memory wise, we don't care as we target it for HPC software.
  2. Enable detection on whether the added item to the SortedSet is being sorted or appended and was inserted at an index number.
  3. In our application we use SortedSet<T>, Stack<T> and HashSet<T> list and lots of for-loop. If these all can be meticulously optimized up to the machine code, that will be good for HPC software.
  4. I'm promoting and advocating .NET in the academic/education ecosystem. High performance computing is mostly the only option for researchers.

I hope these sums up what is the requirement. I understand this is not "mainstream" which most programmer is using time-based processing and not discrete time processing. Thank you for the attention and responses.

danmoseley commented 12 months ago

@rizamarhaban Note also that any API added today would not be in a stable release for 13 months. This is one reason we are suggesting you find another way for your project.

rizamarhaban commented 12 months ago

@danmoseley I appreciated and yes, we already and still trying all ways to hit better performance. Thanks to all the .NET team.

krwq commented 12 months ago

@rizamarhaban wouldn't priority queue work for your case? You'd set priority to be timestamp (or unix timestamp whichever makes it easier) - if you need to reverse order then simply negate priority or write comparer and simply do something like:

// if this is multithreaded the peek, timestamp check and dequeue will need to be locked, once you get element and dequeue you can release the lock
if (queue.TryPeek(out element, out timestamp))
{
  if (timestamp < DateTimeOffset.Now.AddHours(1)) // next task is within the next hour (this could be simulated time)
  {
    queue.Dequeue(); // resultshould be reference equal to element
  }
}

Sorry in advance if I misunderstood your scenario, I don't know much about DES etc...

rizamarhaban commented 11 months ago

@krwq There are no queues in DES executive and should never use lock as well. I think from my experiment in the past 4 years, the better way is just use SortedSet for the future event list. We add items and it will be sorted automatically. Cheaper and easier. We just want to know whether the item that we add in the sorted set is sorted or just appended and in which index has been inserted fast. That's why better adding two methods in the SortedSet and returned the information rather than traversing the element after adding the items. We already have the system and won't change it for now as we have another 5 years to finish the simulation project. This will run on 60K core processor, then later we will try above that core processor. Thanks for the idea.