dotnet / csharplang

The official repo for the design of the C# programming language
11.41k stars 1.02k forks source link

Generic Mathematics in C# #2377

Closed ZacharyPatten closed 5 years ago

ZacharyPatten commented 5 years ago

C# is the best language out there, but people are coding C# all wrong... Functional programming is being ignored.

Why am I posting here? Because I want this information to be seen by the community and the developers of the C# language in case they aren't aware of these techniques. Several of the techniques in my code would be amazing if they were added to C# with customized syntax and keywords.

I have an open source project where I'm trying to add a lot of features that are missing in C# (such as generic mathematics) or being miss handled by most major libraries (such as generic data structures).

https://github.com/ZacharyPatten/Towel

You can break type safeness in C# using generic types and runtime compilation, and you can store the runtime compilation in a delegate for almost no overhead whatsoever. This allows for fast generic mathematics (something most people think inst possible). Last time I looked at the source code for F#, it was using object casting under the hood to perform generic mathematics, which is much worse than runtime compilation. Most common libraries that do "generic math" use interfaces, and I think that is all wrong.

The fact that you can break type safeness like this makes me think that C# could add a keyword or maybe some syntax sugar to the language to make it a little easier on the eyes for generic mathematics. :)

C#'s data structures in the System.Collections.Generic are great... But it is missing so many common data structures such as a heap. All the examples of generic data structures I see on GitHub are bad though... Most of them use interfaces. Instead of using interfaces, you should make generic data structures with delegates. A heap for example only needs a compare delegate in order to work (see the code in my project). If you use delegates rather than interfaces, you can use any generic type in the structure (even primitives), and you can have multiple data structures of the same type that use different sorting, because you can just pass in a different delegate.

The IEnumerable type is great EXCEPT when using recursive data structures. You cannot use IEnumerable with recursive data structures efficiently because it doesn't allow for recursion. What is the solution to this? In my code in Towel, I use delegates instead of IEnumerable methods on my recursive data structures. I called my delegates "Step" and "Stepper" as if you are stepping through a collection. In my opinion, it would be really cool if they added a new keyword to C# "stepper" that is like "foreach" but uses delegates similar to how I use them in my Towel project. Also, "foreach" does not currently allow for additional parameters like many of my "stepper" functions do. My stepper functions take in stuff like "minimum" and "maximum" that allow for proper traversal of the data structure unlike how "foreach" traverses everything.

The functional programming techniques I'm mentioning here are something that Java cannot do because Java does not have delegates.

I hope anyone who reads this finds it interesting and likes my code. I'm just trying to help and get C# on the proper path. I hope you agree.

CyrusNajmabadi commented 5 years ago

Do you have a suggestion for a change you would like to see in the language? If so, it woudl be good to open a proposal.

CyrusNajmabadi commented 5 years ago

My stepper functions take in stuff like "minimum" and "maximum" that allow for proper traversal of the data structure unlike how "foreach" traverses everything.

I'm not certain why that's a good thing. A general approach to something like this would be instead to do:

foreach (var v in myCollection.GetRange(min: x, max: y)) or foreach (var v in myCollection.FromMin(x).ToMax(y))

It doesn't seem like something foreach needs to be aware of itself.

HaloFour commented 5 years ago

Check out the following discussions:

https://github.com/dotnet/csharplang/issues/1711 https://github.com/dotnet/csharplang/issues/339 https://github.com/dotnet/csharplang/issues/164 https://github.com/dotnet/csharplang/issues/110

I think they seek to address some of the generic issues you've mentioned, particularly around mathematics.

As for common data structures, you might want to take that up with the CoreFX team who owns the underlying library. This repository is specific to language features.

ZacharyPatten commented 5 years ago

@CyrusNajmabadi The foreach pattern you just showed will not work effieicently on recursive data structures. That is the point I'm trying to make. If you look at the IEnumerable functions on my AVL and Red Black trees it will hopefully make sense. IEnumerable requires you to store a stack for enumerations. My stepper implementation does not.

@HaloFour I don't need a solution for generic mathematics. I already found one. I'm trying to share my pattern the the community and the C# developers because I don't think they know you can do it. If they knew you can do it... Why would they be using object casting in F#? They need to see my code.

HaloFour commented 5 years ago

So, what exactly is the purpose of this issue? To advertise your library?

CyrusNajmabadi commented 5 years ago

@CyrusNajmabadi The foreach pattern you just showed will not work effieicently on recursive data structures.

Why not? You can easily implement an IEnumerator method that works on recursive data and is perfectly efficient (i.e. O(n)). Roslyn itself is full of them (since enumerating over a section of a tree is something tha tis commonly done).

ZacharyPatten commented 5 years ago

@HaloFour to a degree yes. I don't think that Microsoft knows you can break type safeness using runtime compilation like this and store in a delegate. again if they didn't know then why are they doing object casting in f-sharp for their generic math functions.

@CyrusNajmabadi give me one example of someone using ienumerable on a recursive data structure without storing a stack in the program's heap and I will eat my shirt :P I do not think it is possible. Again look at my AVL tree and red black tree. they have to store a stack in the ienumerable methods

CyrusNajmabadi commented 5 years ago

@CyrusNajmabadi give me one example of someone using ienumerable on a recursive data structure without storing a stack in the program's heap and I will eat my shirt :P I do not think it is possible.

You said "The foreach pattern you just showed will not work effieicently on recursive data structures."

There is nothing inefficient about keeping track of your location with a stack. We do this in roslyn all over the place in an allocation-free manner (i.e. by pooling stacks). It's simple, effective, and fast.

Your desire to not use this approach isn't a motivation in and of itself.

Again look at my AVL tree and red black tree. they have to store a stack in the ienumerable methods

Yup... and? :) Your stepper approach uses a 'delegate' Does that make it inefficient? In both cases there is a tiny allocation, and then a fast linear walk over the data. I'm not seeing the problem. Both in O(n)` terms and in just absolute terms things should be very efficient here.

CyrusNajmabadi commented 5 years ago

@CyrusNajmabadi give me one example of someone using ienumerable on a recursive data structure without storing a stack in the program's heap and I will eat my shirt :

Seems simple enough. You'd just need this:

public IEnumerator<T> GetEnumerator()
{
      for (var current = this.GetMinNode(root); current != null; current = this.GetNextNode(current))
           yield return current.value;
}

private Node<T> GetNextNode(Node<T> current)
{
    if (current.right != null)
         return GetMinNode(current.right);

    var parent = current.parent;
    while (parent != null && current == parent.right)
    {
        current = parent;
        parent = parent.parent;
    }  

    return parent; 
}

Seems pretty simple. It's been ages since i did binary/avl trees, so i might have my logic slightly wrong. But it's pretty trivial to do an in-order walk of things without needing intermediary storage.

CyrusNajmabadi commented 5 years ago

C#'s data structures in the System.Collections.Generic are great... But it is missing so many common data structures such as a heap. All the examples of generic data structures I see on GitHub are bad though... Most of them use interfaces. Instead of using interfaces, you should make generic data structures with delegates. A heap for example only needs a compare delegate in order to work (see the code in my project). If you use delegates rather than interfaces, you can use any generic type in the structure (even primitives), and you can have multiple data structures of the same type that use different sorting, because you can just pass in a different delegate.

This is not a csharplang issue. The collections in System.Collections.Generic are unrelated to the language. Please pass any feedback along to dotnet/corefx which owns these APIs and will consider your proposals.

CyrusNajmabadi commented 5 years ago

@HaloFour to a degree yes. I don't think that Microsoft knows you can break type safeness using runtime compilation like this and store in a delegate. again if they didn't know then why are they doing object casting in f-sharp for their generic math functions.

That sounds like a question to raise with the FSharp team. You can find them over at https://github.com/fsharp/fsharp. Implementation questions like that are out of scope for this repo. This is just the repo for the C# langauge itself. It is not for the C# compiler. It is not for the .Net runtime or jit, or vm (go to dotnet/coreclr for that). It is not for the libraries that ship as part of .Net (go to dotnet/corefx for that).

Thanks!

HaloFour commented 5 years ago

@ZacharyPatten

F# likely implemented their functionality before LINQ expressions were added to the BCL. F# does a lot of things that are weird.

The approach you've taken with using those expressions compiled to delegates is interesting, I've used it myself for some specific tasks but not anything so general purpose. The one thing I don't like about it is that it offers no compile-time type safety. If, e.g., whatever you're adding doesn't have an add operator it will throw at runtime.

ZacharyPatten commented 5 years ago

@CyrusNajmabadi In regards to the runtime complexity, yes both implementations (delegate vs IEnumerable+stack) appear to have the same runtime complexity... however, from my testing the delegate implementation is so much faster. Unless I'm making some drastic errors in my testing and theory, I don't think the IEnumerable+stack version would ever compare. And there is likely hidden algorithm complexity in the under layers of managing that memory on the programs heap.

@HaloFour @CyrusNajmabadi Thanks for pointing out the F# repo, I should probably show them this. However, since generic mathematics is possible in C# (cause my code is doing it) I personally think they should add it into the language (so people don't have to pull in my Towel project or write their own Linq compilation). As for the run time errors, I agree that compile time errors would be better, but they could fix that too by slightly changing the language so anytime it sees a generic math operation, it ensures the type has a mathematical operator. This would obviously be a language addition/change (and I think this is the correct place to discuss these kinds of topics).

CyrusNajmabadi commented 5 years ago

however, from my testing the delegate implementation is so much faster.

You're welcome ot ask the BCL to include 'internal iterators' in their collections as well as the 'external iterators' that are currently there. This is not a C# issue.

Unless I'm making some drastic errors in my testing and theory, I don't think the IEnumerable+stack version would ever compare.

  1. as demonstrated, you shouldn't need stack.
  2. I have no idea how you tested. Furthermore, you may be associating different costs at teh wrong location. For example, if you're not pooling your stack, or you're using some stack impl that has too high a cost with it, then that will unfairly count against things.

And there is likely hidden algorithm complexity in the under layers of managing that memory on the programs heap.

I don't know what this means. It's like me saying "there's likely hidden complexity in managing the memory for the delegate".

CyrusNajmabadi commented 5 years ago

@HaloFour @CyrusNajmabadi Thanks for pointing out the F# repo, I should probably show them this. However, since generic mathematics is possible in C# (cause my code is doing it) I personally think they should add it into the language

As stated above: Do you have a suggestion for a change you would like to see in the language? If so, it woudl be good to open a proposal.

Nothing will happen without a proposal. It would be good if you also mentioend how your proposal differs from teh others that Halo linked to.

ZacharyPatten commented 5 years ago

@CyrusNajmabadi give me one example of someone using ienumerable on a recursive data structure without storing a stack in the program's heap and I will eat my shirt :

Seems simple enough. You'd just need this:

public IEnumerator<T> GetEnumerator()
{
      for (var current = this.GetMinNode(root); current != null; current = this.GetNextNode(current))
           yield return current.value;
}

private Node<T> GetNextNode(Node<T> current)
{
    if (current.right != null)
         return GetMinNode(current.right);

    var parent = current.parent;
    while (parent != null && current == parent.right)
    {
        current = parent;
        parent = parent.parent;
    }  

    return parent; 
}

Seems pretty simple. It's been ages since i did binary/avl trees, so i might have my logic slightly wrong. But it's pretty trivial to do an in-order walk of things without needing intermediary storage.

I don't think that code works. I may be blind... but I don't see it working. I will have to try it when I am at my computer.

Also, it requires a parent pointer on the nodes, which isn't always desired.

ZacharyPatten commented 5 years ago

@HaloFour @CyrusNajmabadi Thanks for pointing out the F# repo, I should probably show them this. However, since generic mathematics is possible in C# (cause my code is doing it) I personally think they should add it into the language

As stated above: Do you have a suggestion for a change you would like to see in the language? If so, it woudl be good to open a proposal.

Nothing will happen without a proposal. It would be good if you also mentioend how your proposal differs from teh others that Halo linked to.

Yes... I intended this to be in the proposal sections :P... I'm still new to github... :D

ZacharyPatten commented 5 years ago

@HaloFour @CyrusNajmabadi Thanks for pointing out the F# repo, I should probably show them this. However, since generic mathematics is possible in C# (cause my code is doing it) I personally think they should add it into the language

As stated above: Do you have a suggestion for a change you would like to see in the language? If so, it woudl be good to open a proposal. Nothing will happen without a proposal. It would be good if you also mentioend how your proposal differs from teh others that Halo linked to.

Yes... I intended this to be in the proposal sections :P... I'm still new to github... :D

ZacharyPatten commented 5 years ago

@CyrusNajmabadi Should I go ahead and close this issue and instead make a "proposal"?

CyrusNajmabadi commented 5 years ago

I don't think that code works. I may be blind... but I don't see it working. I will have to try it when I am at my computer.

Could you explain what the issue is? If you have a right side, then there is a value that is greater than you down that sid,e so you go down that side. Otherwise, if you don't, you walk up until you find yourself on hte left side of your parent. Then your parent is the next node in the ordered sequence of your tree.

Also, it requires a parent pointer on the nodes, which isn't always desired.

I was simply responding to your claim, which appears to be incorrect. It is pretty easy to iterate over a binary tree without the need for secondary storage.

and I will eat my shirt :)

Pics, please :)

ZacharyPatten commented 5 years ago

I don't think that code works. I may be blind... but I don't see it working. I will have to try it when I am at my computer.

Could you explain what the issue is? If you have a right side, then there is a value that is greater than you down that sid,e so you go down that side. Otherwise, if you don't, you walk up until you find yourself on hte left side of your parent. Then your parent is the next node in the ordered sequence of your tree.

Also, it requires a parent pointer on the nodes, which isn't always desired.

I was simply responding to your claim, which appears to be incorrect. It is pretty easy to iterate over a binary tree without the need for secondary storage.

and I will eat my shirt :)

Pics, please :)

When I used the code you posted it only iterated the left most node in the tree and not the entire tree. Did I mis-interpret any of your code? Cause this doesn't work.

        public IEnumerator<T> GetEnumerator()
        {
              for (var current = this.GetLeftMostNode(_root); current != null; current = this.GetNextNode(current))
                   yield return current.Value;
        }

        private Node GetNextNode(Node current)
        {
            if (current.RightChild != null)
                 return GetLeftMostNode(current.RightChild);

            var parent = current.Parent;
            while (parent != null && current == parent.RightChild)
            {
                current = parent;
                parent = parent.Parent;
            }  

            return parent; 
        }

        private Node GetLeftMostNode(Node node)
        {
            while (node.LeftChild != null)
            {
                node = node.LeftChild;
            }
            return node;
        }

I assumed your "GetMinNode" was just the left most child.

CyrusNajmabadi commented 5 years ago

Worked for me. Did you set the parents properly?

ZacharyPatten commented 5 years ago

Worked for me. Did you set the parents properly?

Well crap... I was using your code in a RedBlack Tree and I forgot to check for the sentinal node instead of the null check:

        public IEnumerator<T> GetEnumerator()
        {
              for (var current = this.GetLeftMostNode(_root); current != null; current = this.GetNextNode(current))
                   yield return current.Value;
        }

        private Node GetNextNode(Node current)
        {
            if (current.RightChild != null && current.RightChild != _sentinelNode)
                 return GetLeftMostNode(current.RightChild);

            var parent = current.Parent;
            while (parent != null && current == parent.RightChild)
            {
                current = parent;
                parent = parent.Parent;
            }  

            return parent; 
        }

        private Node GetLeftMostNode(Node node)
        {
            while (node.LeftChild != null && node.LeftChild != _sentinelNode)
            {
                node = node.LeftChild;
            }
            return node;
        }

So that code DID work.. and I do in fact need to eat my shirt. :P Learn something new every day...

However, the generic mathematics is still another issue that I think they should consider adding to the base C#.... and it sounds like I should add a "proposal" with my code as an example.

CyrusNajmabadi commented 5 years ago

and it sounds like I should add a "proposal" with my code as an example.

Sounds good :)

ZacharyPatten commented 5 years ago

@CyrusNajmabadi Hey.... sorry to bug you again... but you seem like an expert at data structures. :) In my library I linked (Towel) to I have an "omnitree" which is a generic N-D spacial partitioning tree (like a quadtree or octree). It uses the median algorithm to divide items along each axis, but it allows for subdivision override (for numeric types you may want to use "mean" cause it is faster).

I've never seen anyone make a generic N-D SPT like mine, and I could really use your (or anyone else's) opinions on it. I already have a number of optimizations in mind (such as not using arrays to store the children on lower dimensional versions of the tree), but I'm sure there are tons of optimizations that I'm missing out on. And I am currently using a ".tt" template to generate all the different dimensional versions of the tree (not sure if there is a better way to generate that code).

Any suggestions/criticisms that could help me improve it would be greatly appreciated. :)