This PR makes a small change, which should yield performance benefits for Arm users. The current F32 L2 implementation only uses Neon if the vector is a multiple of size 16. With this change Neon will be used for any vector that is greater than or equal to size 16. If the vector is not a clean multiple the remaining elements will be processed sequentially.
Always Use Neon for L2 f32
This PR makes a small change, which should yield performance benefits for Arm users. The current F32 L2 implementation only uses Neon if the vector is a multiple of size 16. With this change Neon will be used for any vector that is greater than or equal to size 16. If the vector is not a clean multiple the remaining elements will be processed sequentially.
Correctness
Correctness test
```sql .load ./dist/vec0 create virtual table vectors using vec0( vec float[25] ); insert into vectors(rowid, vec) values (1, '[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924, 0.123, -0.456, 0.789, 0.012, -0.345, 0.678, -0.901, 0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.012, 0.345, -0.678, 0.901]'), (2, '[0.443, -0.501, 0.355, -0.771, 0.707, -0.708, -0.185, 0.362, -0.987, 0.654, -0.321, 0.098, 0.765, -0.432, 0.109, -0.876, 0.543, -0.210, 0.987, -0.654, 0.321, -0.098, 0.765, -0.432, 0.109]'); select v1.rowid as id1, v2.rowid as id2, vec_distance_l2(v1.vec, v2.vec) as l2_distance from vectors v1 cross join vectors v2 where v1.rowid = 1 and v2.rowid != 1; ```Without neon:
With neon:
Performance
Performance Test Script
```python import sqlite3 import random import json import timeit def create_table_and_insert_data(conn, vector_size, num_rows, seed=42): conn.execute(f"CREATE VIRTUAL TABLE IF NOT EXISTS vectors USING vec0(vec float[{vector_size}])") random.seed(seed) data = [] for i in range(num_rows): vector = [random.uniform(-1, 1) for _ in range(vector_size)] data.append((i+1, json.dumps(vector))) conn.executemany("INSERT INTO vectors(rowid, vec) VALUES (?, ?)", data) def calculate_distances(conn): cursor = conn.cursor() cursor.execute(''' SELECT v1.rowid as id1, v2.rowid as id2, vec_distance_l2(v1.vec, v2.vec) as l2_distance FROM vectors v1 CROSS JOIN vectors v2 WHERE v1.rowid = 1 AND v2.rowid != 1 ''') results = cursor.fetchall() def main(): vector_size = 2003 num_rows = 1005 seed = 42 conn = sqlite3.connect(':memory:') conn.enable_load_extension(True) conn.load_extension('./dist/vec0') create_table_and_insert_data(conn, vector_size, num_rows, seed) def run_query(): calculate_distances(conn) exec_time = timeit.timeit(run_query, number=100) print(f"Total execution time over 100 runs: {exec_time:.5f} seconds") conn.close() if __name__ == '__main__': main() ```Without neon:
With neon: