when the ORDER BY clause exactly matches the "AS neighbor_distance" select clause, we can simply use the neighbor_distance alias to simplify the query.
Ultimately doesn't change the function of the query, but cuts the length in half which simplifies things when the query is being logged to log files etc. since it removes the need for including all the vectors 2x in the query
e.g. changes a query like this:
SELECT "llm_embeddings"."id", "llm_embeddings"."source_type", "llm_embeddings"."source_id", "llm_embeddings"."created_at", "llm_embeddings"."updated_at", "llm_embeddings"."created_by", "llm_embeddings"."updated_by", "llm_embeddings"."llm_model_id",
"llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353,<.....>,0.024415132566991064]' AS neighbor_distance
FROM "llm_embeddings"
WHERE "llm_embeddings"."source_type" = 'LlmSource' AND "llm_embeddings"."embedding" IS NOT NULL
ORDER BY "llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353,<.....>,0.024415132566991064]'
LIMIT 5;
into
SELECT "llm_embeddings"."id", "llm_embeddings"."source_type", "llm_embeddings"."source_id", "llm_embeddings"."created_at", "llm_embeddings"."updated_at", "llm_embeddings"."created_by", "llm_embeddings"."updated_by", "llm_embeddings"."llm_model_id",
"llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353<.....>,0.024415132566991064]' AS neighbor_distance
FROM "llm_embeddings"
WHERE "llm_embeddings"."source_type" = 'LlmSource' AND "llm_embeddings"."embedding" IS NOT NULL
ORDER BY neighbor_distance
LIMIT 5;
When the vector list is many hundreds or thousands of vectors long, this can really help clean up log files
Hi @moracca, thanks for the PR. However, this will cause issues with methods that change the SELECT clause afterwards, like reselect and pluck (see the failing test case).
when the ORDER BY clause exactly matches the "AS neighbor_distance" select clause, we can simply use the neighbor_distance alias to simplify the query.
Ultimately doesn't change the function of the query, but cuts the length in half which simplifies things when the query is being logged to log files etc. since it removes the need for including all the vectors 2x in the query
e.g. changes a query like this:
into
When the vector list is many hundreds or thousands of vectors long, this can really help clean up log files