Open amaid opened 1 year ago
@theolivenbaum - Pull request created to address issue #8 (Distance calculation using additional data)
đŸ‘‹ @amaid You've essentially abstracted away each separate vector in the embedding, rather than abstracting away the vector array float[]
. I'm going to open a PR to your fork to fix this, which will show here once merged.
@amaid I've got a PR opened on your fork. Once merged, the changes should show in this PR automatically.
Needed for https://github.com/Arlodotexe/OwlCore.AI.Exocortex
@amaid I've got a PR opened on your fork. Once merged, the changes should show in this PR automatically.
Needed for https://github.com/Arlodotexe/OwlCore.AI.Exocortex
Awesome! merged.
@theolivenbaum I recommend squashing this PR when merging/closing.
I'm also aware that this is a breaking change to the library, which will need a bit of discussion and possibly a migration guide.
Or, maybe we should just ship an inbox non-generic implementation to polyfill what we had before? Something like:
public class RawVectorArrayUmapDataPoint : IUmapDataPoint
{
public RawVectorArrayUmapDataPoint(float[] data) => Data = data;
public float[] Data { get; }
/// <summary>
/// Define an implicit conversion operator from <see cref="float[]"/>.
/// </summary>
public static implicit operator RawVectorArrayUmapDataPoint(float[] data) => new(x);
/// <summary>
/// Implicit conversation back to <see cref="float[]"/>.
/// </summary>
public static implicit operator float[](RawVectorArrayUmapDataPoint x) => x.Data;
}
public class Umap : Umap<RawVectorArrayUmapDataPoint>
{
public Umap(
DistanceCalculation<RawVectorArrayUmapDataPoint> distance = null,
IProvideRandomValues random = null,
int dimensions = 2,
int numberOfNeighbors = 15,
int? customNumberOfEpochs = null,
ProgressReporter progressReporter = null)
: base(distance, random, dimensions, numberOfNeighbors, customNumberOfEpochs, progressReporter)
{
// ...
}
}
The implicit conversion in RawVectorArrayUmapDataPoint
should allow the library consumer to use the library as they were before, passing a float[]
directly to a new Umap(...).InitializeFit(someFloatArray);
without the generics.
@theolivenbaum I recommend squashing this PR when merging/closing.
I'm also aware that this is a breaking change to the library, which will need a bit of discussion and possibly a migration guide.
Or, maybe we should just ship an inbox non-generic implementation to polyfill what we had before? Something like:
public class RawVectorArrayUmapDataPoint : IUmapDataPoint { public RawVectorArrayUmapDataPoint(float[] data) => Data = data; public float[] Data { get; } /// <summary> /// Define an implicit conversion operator from <see cref="float[]"/>. /// </summary> public static implicit operator RawVectorArrayUmapDataPoint(float[] data) => new(x); /// <summary> /// Implicit conversation back to <see cref="float[]"/>. /// </summary> public static implicit operator float[](RawVectorArrayUmapDataPoint x) => x.Data; } public class Umap : Umap<RawVectorArrayUmapDataPoint> { public Umap( DistanceCalculation<RawVectorArrayUmapDataPoint> distance = null, IProvideRandomValues random = null, int dimensions = 2, int numberOfNeighbors = 15, int? customNumberOfEpochs = null, ProgressReporter progressReporter = null) : base(distance, random, dimensions, numberOfNeighbors, customNumberOfEpochs, progressReporter) { // ... } }
The implicit conversion in
RawVectorArrayUmapDataPoint
should allow the library consumer to use the library as they were before, passing afloat[]
directly to anew Umap(...).InitializeFit(someFloatArray);
without the generics.
This is implemented.
You changed all the internal T to a hardcoded type (the new one), the code I provided works without making further changes to Umap<T>
. We'll need to correct this.
You changed all the internal T to a hardcoded type (the new one), the code I provided works without making further changes to
Umap<T>
. We'll need to correct this.
This has been fixed, the implicit typecasting has been implemented to support explicit float[][] data to support existing class consumers without changing the existing implementation of Umap
@theolivenbaum The PR is ready to be merged. The changes have been tested for existing class consumers as well. The unit tests are passing.
Great, thanks @amaid!
@theolivenbaum Bumping
Closes #8 This pull request appears to be making changes to a C# codebase, likely related to a machine learning or data analysis library. The changes seem to be focused on introducing generic types and refining the code structure. Below is a description of the changes made in this pull request:
DistanceCalculation
Delegate Change:DistanceCalculation
is modified to become a generic delegateDistanceCalculation<T>
whereT
must implement theIUmapDataPoint
interface. This change allows for more flexible distance calculations that can work with different types of data points.IUmapDataPoint
Interface Addition:IUmapDataPoint
is added to theUMAP
namespace. This interface represents a single data point that will be processed by the UMAP algorithm. It requires implementing classes to provide aData
property, likely representing the data associated with the data point.NNDescent
Changes:NNDescent
class, which appears to be related to nearest neighbor descent, is updated to be a generic classNNDescent<T>
whereT
must implement theIUmapDataPoint
interface. This change ensures that the class can work with different types of data points.SIMD
andSIMDInt
Changes:SIMD
andSIMDInt
classes are updated to become generic classesSIMD<T>
andSIMDInt<T>
. This change likely reflects the need to work with different data types (possibly floating-point and integer) depending on the data points used.Tree
Changes:Tree
class is updated to become a generic classTree<T>
whereT
must implement theIUmapDataPoint
interface. This change ensures that the class can work with different types of data points.Umap
Class Changes:Umap
class is updated to become a generic classUmap<T>
whereT
must implement theIUmapDataPoint
interface. This change allows the UMAP algorithm to operate on different types of data points.DistanceCalculation
field is changed toDistanceCalculation<T>
to reflect the use of a generic distance calculation delegate.Umap
class are adjusted to accommodate the use of generic data types.Distance Functions:
DistanceFunctions
class defines different distance calculation functions such as cosine and Euclidean distance. These functions are updated to accept generic data types (T
) instead of float arrays.OptimizationState Class:
OptimizationState
class appears to store various parameters and state information related to the UMAP algorithm. It is not directly impacted by the generic changes but is part of the UMAP class.In summary, this pull request introduces generic type support for the UMAP algorithm, allowing it to work with different data point types while maintaining flexibility in distance calculations and optimizations. Additionally, it adds an interface (
IUmapDataPoint
) for representing data points processed by UMAP.