dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

[API Proposal]: System.Numerics Distance, Vector, Math Operations #83209

Closed luisquintanilla closed 11 months ago

luisquintanilla commented 1 year ago

Backgound and motivation

System.Numerics is a powerful library in .NET that provides support for mathematical operations on vectors, matrices, and complex numbers. Because these are SIMD-enabled types, operations on those types can be optimized. The library already contains a wide range of mathematical operations, but there are some common operations that are missing. In this proposal, we propose the addition of APIs to System.Numerics for the following operations:

By providing a common optimized implementation for these operations, .NET library authors and developers can focus on high-level components of their application.

API proposal

The following are the proposed APIs for the operations listed above.

Additional details:

Property Value
Expected Vector Size 1-8191
Precision FP32, FP16, INT8

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is widely used in natural language processing and information retrieval. The proposed API will take two floating point collections as input and return the cosine similarity between them.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static T CosineSimilarity<T>(ReadOnlySpan<T> vector1, ReadOnlySpan<T> vector2)

public static T CosineSimilarity<T>(T[] vector1, T[] vector2)

public static T CosineSimilarity<T>(Vector<T> vector1, Vector<T> vector2)

Euclidean Distance

Euclidean distance is a measure of distance between two points in a Euclidean space. It is widely used in mathematics, engineering, and machine learning. The proposed API will take two floating point collections as input and return the euclidean distance between the collections.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static T EuclideanDistance<T>(ReadOnlySpan<T> vector1, ReadOnlySpan<T> vector2)

public static T EuclideanDistance<T>(T[] vector1, T[] vector2)

public static T EuclideanDistance<T>(Vector<T> vector1, Vector<T> vector2)

Dot Product

Dot product is a mathematical operation that takes two vectors and returns a scalar. It is widely used in linear algebra and machine learning. The proposed API will take two floating point collections as input and return the dot product of the vectors.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static T Dot<T>(ReadOnlySpan<T> vector1, ReadOnlySpan<T> vector2)

public static T Dot<T>(T[] vector1, T[] vector2)

public static T Dot<T>(Vector<T> vector1, Vector<T> vector2)

Normalize (L2 / Euclidean Length)

Normalize is a mathematical operation that takes a vector and returns a unit vector in the same direction. It is widely used in linear algebra and machine learning. The proposed API will take a vector as input and return the normalized vector.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static ReadOnlySpan<T> Normalize<T>(ReadOnlySpan<T> vector)

public static T[] Normalize<T>(T[] vector)

public static Vector<T> Normalize<T>(Vector<T> vector)

Softmax

Softmax is a function that takes a collection of real numbers and returns a probability distribution. It is widely used in machine learning and deep learning. The proposed API will take a real number collection as input and return the softmax of the collection.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static ReadOnlySpan<T> Softmax<T>(ReadOnlySpan<T> vector)

public static T[] Softmax<T>(T[] vector)

public static Vector<T> Softmax<T>(Vector<T> vector)

Sigmoid

Sigmoid is a function that takes a real number and returns a value between 0 and 1. It is widely used in machine learning and deep learning. The proposed API will take a real number collection as input and return the sigmoid of the number for each element in the collection.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static ReadOnlySpan<T> Sigmoid<T>(ReadOnlySpan<T> vector)

public static T[] Sigmoid<T>(T[] vector)

public static Vector<T> Sigmoid<T>(Vector<T> vector)

Tanh

Tanh is a function that takes a real number and returns a value between -1 and 1. It is widely used in machine learning and deep learning as an activation function. The proposed API will take a real number collection as input and return the tanh for each element in the collection.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static ReadOnlySpan<T> Tanh<T>(ReadOnlySpan<T> vector)

public static T[] Tanh<T>(T[] vector)

public static Vector<T> Tanh<T>(Vector<T> vector)

Exponential (Exp)

Exponential is a function that takes a real number and returns e raised to the power of the number. It is widely used in mathematics, engineering, and machine learning. The proposed API will take a real number collectionas input and return the exponential for each element in the collection.

// Assembly System.Numerics.Vectors
namespace System.Numerics;

public static ReadOnlySpan<T> Exp<T>(ReadOnlySpan<T> vector)

public static T[] Exp<T>(T[] vector)

public static Vector<T> Exp<T>(Vector<T> vector)

API Usage

Cosine Similarity

var input1 = new []{-0.4165,  0.1393,  1.1077}

var input2 = new [] {
    new [] {0.5073, -0.2416,  1.7139},
  new [] {-0.1245, -0.4960, -0.0715}
}

// Outputs
var output = 
    input2
        .Select(v => CosineSimilarity(input1,v))

// Outputs
// [0.7694, -0.1568]

Euclidean Distance

var input1 = new []{-0.4165,  0.1393,  1.1077}

var input2 = new [] {
    new [] {0.5073, -0.2416,  1.7139},
  new [] {-0.1245, -0.4960, -0.0715}
}

// Outputs
var output = 
    input2
        .Select(v => Distance(input1,v))
// Outputs
// [1.1687, 1.3709]

Dot Product

//1-D collection
var input1 = new [] {2,3};

//1-D collection
var input2 = new [] {2,1};

// Outputs
var output = 
    Dot(input1, input2)

// Outputs
// 7

Normalize

var input = new [] {
    new [] {0.1146, -0.5104, -1.2457},
    new [] {0.3813, -0.6505, -0.1605}
}

// Outputs
var output = 
    input.Select(v => Normalize(v))

// Outputs
// [
//      [0.0849, -0.3778, -0.9220],
//      [0.4946, -0.8438, -0.2082], 
//  ]

Related: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel/AI/Embeddings/VectorOperations/NormalizeOperation.cs

Softmax

var input = new [] {
    new [] {0.1146, -0.5104, -1.2457},
    new [] {0.3813, -0.6505, -0.1605}
}

// Outputs
var output = 
    input.Select(v => Softmax(v))

// Outputs
// [
//      [0.5581, 0.2987, 0.1432],
//      [0.5160, 0.1839, 0.3001],   
//  ]

Sigmoid

var input = new [] {
    new [] {0.1146, -0.5104, -1.2457},
    new [] {0.3813, -0.6505, -0.1605}
}

// Outputs
var output = 
    input.Select(v => Sigmoid(v))

// Outputs
// [
//      [0.5286, 0.3751, 0.2234],
//      [0.5942, 0.3429, 0.4600],   
//  ]

Tanh

var input = new [] {
    new [] {0.1146, -0.5104, -1.2457},
    new [] {0.3813, -0.6505, -0.1605}
}

// Outputs
var output = 
    input.Select(v => Tanh(v))

// Outputs
// [
//      [0.1141, -0.4703, -0.8471],
//      [0.3638, -0.5720, -0.1591], 
//  ]

Exponential (Exp)

var input = new [] {
    new [] {0.1146, -0.5104, -1.2457},
    new [] {0.3813, -0.6505, -0.1605}
}

// Outputs
var output = 
    input.Select(v => Tanh(v))

// Outputs
// [
//      [0.1141, -0.4703, -0.8471],
//      [0.3638, -0.5720, -0.1591], 
//  ]

Use cases

Semantic Search, Clustering, and Recommendations using Embeddings

Distance metrics are useful when performing search, clustering and recommendations because you want to surface related items. The following is an examples from Semantic Kernel using Cosine Similarity to find similar embeddings from an embedding collection.

https://github.com/microsoft/semantic-kernel/blob/344bc54f329dd342bf9836913cdd96e25bbeb25f/samples/dotnet/kernel-syntax-examples/Example23_ReadOnlyMemoryStore.cs#L107-L138

public IAsyncEnumerable<(MemoryRecord, double)> GetNearestMatchesAsync(string collectionName, Embedding<float> embedding, int limit,
    double minRelevanceScore = 0, bool withEmbeddings = false, CancellationToken cancel = default)
{
    // Note: with this simple implementation, the MemoryRecord will always contain the embedding.
    if (this._memoryRecords == null || !this._memoryRecords.Any())
    {
        return AsyncEnumerable.Empty<(MemoryRecord, double)>();
    }

    if (embedding.Count != this._vectorSize)
    {
        throw new Exception($"Embedding vector size {embedding.Count} does not match expected size of {this._vectorSize}");
    }

    EmbeddingReadOnlySpan<float> embeddingSpan = new(embedding.AsReadOnlySpan());

    TopNCollection<MemoryRecord> embeddings = new(limit);

    foreach (var item in this._memoryRecords)
    {
        EmbeddingReadOnlySpan<float> itemSpan = new(item.Embedding.AsReadOnlySpan());
        double similarity = embeddingSpan.CosineSimilarity(itemSpan);
        if (similarity >= minRelevanceScore)
        {
            embeddings.Add(new(item, similarity));
        }
    }

    embeddings.SortByScore();

    return embeddings.Select(x => (x.Value, x.Score.Value)).ToAsyncEnumerable();
}

See the following documents for more details:

Normalize ML Model Outputs

Sometimes, the outputs from a machine learning model have to be post-processed. Some common post-processing operations include calculating the sigmoid or softmax

private float Sigmoid(float value)
{
    var k = (float)Math.Exp(value);
    return k / (1.0f + k);
}

private float[] Softmax(float[] values)
{
    var maxVal = values.Max();
    var exp = values.Select(v => Math.Exp(v - maxVal));
    var sumExp = exp.Sum();

    return exp.Select(v => (float)(v / sumExp)).ToArray();
}

private int GetOffset(int x, int y, int channel)
{
    // YOLO outputs a tensor that has a shape of 125x13x13, which 
    // WinML flattens into a 1D array.  To access a specific channel 
    // for a given (x,y) cell position, we need to calculate an offset
    // into the array
    return (channel * this.channelStride) + (y * COL_COUNT) + x;
}

private float GetConfidence(float[] modelOutput, int x, int y, int channel)
{
    return Sigmoid(modelOutput[GetOffset(x, y, channel + 4)]);
}

private CellDimensions MapBoundingBoxToCell(int x, int y, int box, BoundingBoxDimensions boxDimensions)
{
    return new CellDimensions
    {
        X = ((float)x + Sigmoid(boxDimensions.X)) * CELL_WIDTH,
        Y = ((float)y + Sigmoid(boxDimensions.Y)) * CELL_HEIGHT,
        Width = (float)Math.Exp(boxDimensions.Width) * CELL_WIDTH * anchors[box * 2],
        Height = (float)Math.Exp(boxDimensions.Height) * CELL_HEIGHT * anchors[box * 2 + 1],
    };
}

public float[] ExtractClasses(float[] modelOutput, int x, int y, int channel)
{
    float[] predictedClasses = new float[CLASS_COUNT];
    int predictedClassOffset = channel + BOX_INFO_FEATURE_COUNT;
    for (int predictedClass = 0; predictedClass < CLASS_COUNT; predictedClass++)
    {
        predictedClasses[predictedClass] = modelOutput[GetOffset(x, y, predictedClass + predictedClassOffset)];
    }
    return Softmax(predictedClasses);
}

In this case, the Sigmoid and Softmax functions are being used to:

  1. Calculate the confidence that an object exists within the detected bounding boxes.
  2. Get the probabilities for all of the potential classes and map that to each of the bounding boxes.

See the ML.NET ONNX Object Detection tutorial for more details.

Alternative Designs

Today, several of these operations are available in libraries like ML.NET, TorchSharp, Semantic Kernel, and TensorFlow.NET.

These implementations have challenges. In the case of ML.NET, the methods live in an internal class that's not publicly exposed. In the case of TorchSharp and TensorFlow.NET, the size of the library is large and the benefit of consuming the library just for a few functions does not outweigh the downsides of taking on such a large dependency.

The alternative is to implement your own as Semantic Kernel has done which although not hard, it's tedious and focused on functionality.

https://github.com/luisquintanilla/OAIDotnetZeroShotClassification/blob/7e37ba4fa3ff9d41387bbabe2cdd6de303135596/Program.cs#L45-L59

https://github.com/luisquintanilla/mlnet-workshop/issues/16#issuecomment-1374371524

Open Questions

Risks

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation When working with outputs from machine learning models, there are some operations applied to the outputs to normalize the data or measure distances. These operations include but are not limited to: - [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) - [Softmax](https://en.wikipedia.org/wiki/Softmax_function) Today, those operations are available in libraries like ML.NET and TorchSharp. - ML.NET - [Softmax](https://github.com/dotnet/machinelearning/blob/db197154dcd924ae9a7e31074a835b86e34e33bb/src/Microsoft.ML.Core/Utilities/MathUtils.cs#L203) - [Cosine Similarity](https://github.com/dotnet/machinelearning/blob/db197154dcd924ae9a7e31074a835b86e34e33bb/src/Microsoft.ML.Core/Utilities/MathUtils.cs#L735) - TorchSharp - [Softmax](https://github.com/dotnet/TorchSharp/blob/main/src/TorchSharp/NN/Activation/Softmax.cs) - [Cosine Similarity](https://github.com/dotnet/TorchSharp/blob/main/src/TorchSharp/NN/CosineSimilarity.cs) Both of these implementations have challenges. In the case of ML.NET, the methods live in an internal class that's not publicly exposed. In the case of TorchSharp, the size of the library is large and the benefit of consuming the library just for a few functions does not outweigh the downsides of taking on such a large dependency. The alternative is to implement your own which although not hard, it's tedious and not as performant as it could be. https://github.com/luisquintanilla/OAIDotnetZeroShotClassification/blob/7e37ba4fa3ff9d41387bbabe2cdd6de303135596/Program.cs#L45-L59 https://github.com/luisquintanilla/mlnet-workshop/issues/16#issuecomment-1374371524 ### API Proposal The proposal would be to leverage existing components from `System.Numerics` such as Vectors and Generic Math to provide an all-up .NET implementation that can be consumed by other libraries such as ML.NET and others. ```csharp namespace System.Numerics; public static T Softmax(Vector inputs) public static T Softmax(T[] inputs) public static T Softmax(ReadOnlySpan inputs) public static T CosineSimilarity(Vector a, Vector b) public static T CosineSimilarity(T[] a, T[] b) public static T CosineSimilarity(ReadOnlySpan a, ReadOnlySpan b) ``` ### API Usage ```csharp // Softmax var probabilities = new Vector(new [] {0.89,0.75,0.6}) var v = new Vector(probabilities) Softmax(v) // Cosine Similarity var item1 = new []{1.0,2.0,3.0}; var item2 = new []{1.2,1.9,4.0}; var v1 = new Vector(item1); var v2 = new Vector(item2); CosineSimilarity(v1,v2); ``` ### Alternative Designs _No response_ ### Risks There are a few things to consider: - Should these operations be part of an out-of-band package? - How would the implementation of these APIs be affected by down-level consumption?
Author: luisquintanilla
Assignees: -
Labels: `api-suggestion`, `area-System.Numerics`
Milestone: -
colombod commented 1 year ago

what about one hot encoder ?

colombod commented 1 year ago

also assuming it can handle sequence of T and not jump straight into vectors

luisquintanilla commented 1 year ago

Additional operations to consider if not already available.

https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel/AI/Embeddings/VectorOperations

colombod commented 1 year ago

Additional operations to consider if not already available.

https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel/AI/Embeddings/VectorOperations

  • DotProduct
  • Euclidean Distance
  • Normalize (Euclidean Length)

makes total sense!

We start with those and the right dev and testing approach and will be very easy to expand. This is promising

stephentoub commented 1 year ago

@luisquintanilla, for all of the APIs your proposing, can you clarify on what types you're expecting these to live? Are you envisioning some new static class of extensions? Would these be on the existing MemoryExtensions? Would anything here be on the generic math interfaces? Are there constraints on the Ts involved here? Etc. Or @tannergooding, do you have a vision for this? I can come up with some recommendations if needed, but I wasn't sure if it'd already been worked through and just not included here.

luisquintanilla commented 1 year ago

@stephentoub good questions. Here are my thoughts, though I'd defer to @tannergooding since he's much more familiar with these APIs.

what types you're expecting these to live?

I think we can use the Dot operation as a good example where these might live. Today the Dot operation is a static method on the Vector type.

https://learn.microsoft.com/dotnet/api/system.numerics.vector.dot?view=net-8.0

However, since we'd be looking to support other types like ReadOnlySpan and System.Collections like array, is Vectors still the best place for them?

Would anything here be on the generic math interfaces...Are there constraints on the Ts involved here?

This is possibly a good opportunity to use Generic Math interfaces here. Although we expect to mostly work with floating point numbers, there are some cases where integers may also be used. I've listed above the types of values we expect:

What these map to in the context of Generic Math interfaces, I'm not sure.

Beyond numeric interfaces, I noticed that there are also function interfaces like IExponentialFunctions and IHyperbolicFunctions which I think satisfy some of the operations proposed here.

ericstj commented 1 year ago

Are all of these operands single dimensional, or also 2-dim, n-dim? If the latter, we'll need to also think about how we describe dimension information. I doubt we'll want to use .NET multi-dimensional or jagged arrays (only) since those are not friendly to interop. Samples above show 2-dim jagged arrays.

stephentoub commented 1 year ago

Samples above show 2-dim jagged arrays.

Which in particular? Skimming again, the inputs I see are all single dim... where there's jagged, they're actually treated as collections of arrays, where the inputs to the functions are all single dim, eg

var input1 = new []{-0.4165,  0.1393,  1.1077}

var input2 = new [] {
    new [] {0.5073, -0.2416,  1.7139},
  new [] {-0.1245, -0.4960, -0.0715}
}

// Outputs
var output = input2.Select(v => CosineSimilarity(input1,v))

// Outputs
// [0.7694, -0.1568]
colombod commented 1 year ago

I am still pondering the support for double, if we had proper generic support this would have been easier : https://github.com/dotnet/csharplang/discussions/6308#discussioncomment-3688642

ericstj commented 1 year ago

where the inputs to the functions are all single dim

I see - I missed that it was using linq over the rows and listing the output as an array instead of IEnumerable. For single dim the specified inputs seem reasonable 👍. I think @colombod mentioned it might be interesting to add some interface as an input.

tannergooding commented 1 year ago

Have opened https://github.com/dotnet/runtime/issues/89639 since the new design builds on this initial proposal and adjusts it to take into account the overall future direction desired here.

The biggest note is that it separates out the more direct concern of the downlevel support needed from the broader future support for other types and scenarios, which will likely be for modern .NET only and likely generic

tannergooding commented 11 months ago

Closing as this was superseded by #89639