Support outputDimensionality reduction parameter for embedding models

MrCsabaToth commented 2 months ago

Description of the feature request:

Since text-embedding-004 the API calls support outputDimensionality parameter. That truncates the vector to the given size. See https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#advanced-use

{
  "instances": [
    { "content": "TEXT",
      "task_type": "TASK_TYPE",
      "title": "TITLE"
    },
  ],
  "parameters": {
    "autoTruncate": AUTO_TRUNCATE,
    "outputDimensionality": OUTPUT_DIMENSIONALITY
  }
}

What problem are you trying to solve with this feature?

Reduce the storage size of the vectors as a trade-off for some accuracy / precision

Any other information you'd like to share?

Partial workaround: since it sounds like there's no PCA (Principal Component Analysis) going on for example for 256 dimension, it's a simple truncation, until this is supported on the Dart API someone can perform the truncation themselves. Then the only thing which remain in that case is additional bandwidth prominent with batch inferences.

davidmigloz commented 2 months ago

It is supported already since v0.4.0 (https://github.com/google-gemini/generative-ai-dart/pull/149)

MrCsabaToth commented 2 months ago

Oh I searched within the issues but not the PRs! So it's part of the content request and not the model instantiation, this way it can be changed dynamically per request!

google-gemini / generative-ai-dart