Closed MBkkt closed 3 months ago
Hi @MBkkt, thanks for the links, very useful!
We have a lot of kernels for SIMSIMD_TARGET_HASWELL
, potentially avoiding f32
overloads in some cases. Smaller registers and single/double-precision types don't often yield substantial boosts. Which kernels and CPU generations are you most interested in?
Also lot of code compiled with -mpreferred-vector-width=256, because downclocking issues Such thing can be detected in runtime to choose smaller vector size before ice/rocket lake https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html
This might be an interesting addition. We can probably add a helper function to detect potential down-clocking, that users will pass as allowed
to the following dynamic dispatch function:
/**
* @brief Determines the best suited metric implementation based on the given datatype,
* supported and allowed by hardware capabilities.
*
* @param kind The kind of metric to be evaluated.
* @param datatype The data type for which the metric needs to be evaluated.
* @param supported The hardware capabilities supported by the CPU.
* @param allowed The hardware capabilities allowed for use.
* @param metric_output Output variable for the selected similarity function.
* @param capability_output Output variable for the utilized hardware capabilities.
*/
SIMSIMD_PUBLIC void simsimd_find_metric_punned( //
simsimd_metric_kind_t kind, //
simsimd_datatype_t datatype, //
simsimd_capability_t supported, //
simsimd_capability_t allowed, //
simsimd_metric_punned_t* metric_output, //
simsimd_capability_t* capability_output) {
Would that make sense? Can you contribute it?
Would that make sense?
Sounds like good solution to me.
Can you contribute it?
Probably not now :(
The feature is already available with a bunch of kernels for older CPUs added as well, @MBkkt 🤗
Most of your code for x86_64 under skylake ifdef
Not all servers have very modern CPU, haswell and westmere overloads exists for simdjson and simdutf for an example
Examples under Apache 2.0: https://github.com/google-research/google-research/tree/master/scann/scann/distance_measures/one_to_one https://github.com/ydb-platform/ydb/tree/main/library/cpp/dot_product https://github.com/ydb-platform/ydb/tree/main/library/cpp/l1_distance https://github.com/ydb-platform/ydb/tree/main/library/cpp/l2_distance
Also lot of code compiled with -mpreferred-vector-width=256, because downclocking issues
Such thing can be detected in runtime to choose smaller vector size before ice/rocket lake https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html