couchbaselabs / couchbase-shell

Shell yeah!
http://couchbase.sh
Apache License 2.0
65 stars 13 forks source link

doc get to vector enrich-doc help example does not work #392

Closed ingenthr closed 1 week ago

ingenthr commented 4 weeks ago

There is more than one issue here I think, but I'll focus on the vector enrich-doc help example. If I run that and specify the model, I get an error. For my document, this matches the help:

> doc get foo | select content | vector enrich-doc foo --model text-embedding-3-small
Embedding batch 1/1 
Error:   ร— Could not locate 'id' field in docs, if not called 'id' specify using --id-column

I hit several other errors though:

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> vector enrich-doc -h
Enriches given JSON with embeddings of selected field

Usage:
  > vector enrich-doc {flags} <field> 

Flags:
  -h, --help - Display the help message for this command
  --model <String> - the model to generate the embeddings with
  --dimension <Int> - dimension of the resulting embeddings
  --maxTokens <Int> - the token per minute limit for the provider/model
  --id-column <String> - the name of the id column if used with an input stream
  --vectorField <String> - the name of the field into which the embedding is written, defaults to fieldVector

Parameters:
  field <string>: the field from which the vector is generated

Examples:
  Open local json doc and enrich the field named 'description'
  > open ./local.json | vector enrich-doc description --model amazon.titan-embed-text-v2:0

  Fetch a single doc with id '12345' and enrich the field named 'description'
  > doc get 12345 | select content | vector enrich-doc description --model models/text-embedding-004

  Fetch and enrich all landmark documents from travel sample and upload the results to couchabase
  > query  'SELECT * FROM `travel-sample` WHERE type = "landmark"' | select content | vector enrich-doc content --model amazon.titan-embed-text-v1 | doc upsert

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | vector enrich-doc
Error: nu::parser::missing_positional

  ร— Missing required positional argument.
   โ•ญโ”€[entry #115:1:32]
 1 โ”‚ doc get foo | vector enrich-doc
   โ•ฐโ”€โ”€โ”€โ”€
  help: Usage: vector enrich-doc {flags} <field> . Use `--help` for more information.

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | vector enrich-doc description
Error:   ร— no embed model provided
   โ•ญโ”€[entry #116:1:15]
 1 โ”‚ doc get foo | vector enrich-doc description
   ยท               โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   ยท                       โ•ฐโ”€โ”€ 
   โ•ฐโ”€โ”€โ”€โ”€
  help: supply the embed_model in the config file or using the --model flag

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | vector enrich-doc description --model models/text-embedding-004
Error:   ร— Could not parse input from query

Error: nu::shell::cant_convert

  ร— Can't convert to record.
   โ•ญโ”€[entry #117:1:1]
 1 โ”‚ doc get foo | vector enrich-doc description --model models/text-embedding-004
   ยท โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   ยท    โ•ฐโ”€โ”€ can't convert string to record
   โ•ฐโ”€โ”€โ”€โ”€

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc description --model models/text-embedding-004
Error:   ร— The field 'description' must be present in input record

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model models/text-embedding-004
Embedding batch 1/1 
Error:   ร— invalid model ID

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> cb-env
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ username     โ”‚ Administrator โ”‚
โ”‚ display_name โ”‚ Administrator โ”‚
โ”‚ cluster      โ”‚ dino          โ”‚
โ”‚ bucket       โ”‚ jiraexp       โ”‚
โ”‚ scope        โ”‚ issues        โ”‚
โ”‚ collection   โ”‚ users         โ”‚
โ”‚ cluster_type โ”‚ other         โ”‚
โ”‚ llm          โ”‚ ingOpenAI     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> text-embedding-3-small
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model models/text-embedding-3-small
Embedding batch 1/1 
Error:   ร— invalid model ID

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model models/text-embedding-3-large
Embedding batch 1/1 
Error:   ร— invalid model ID

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model models/text-embedding-ada-002
Embedding batch 1/1 
Error:   ร— invalid model ID

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model text-embedding-3-small
Embedding batch 1/1 
Error:   ร— Could not locate 'id' field in docs, if not called 'id' specify using --id-column

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | vector enrich-doc foo --model text-embedding-3-small
Error:   ร— Could not parse input from query

Error: nu::shell::cant_convert

  ร— Can't convert to record.
   โ•ญโ”€[entry #126:1:1]
 1 โ”‚ doc get foo | vector enrich-doc foo --model text-embedding-3-small
   ยท โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   ยท    โ•ฐโ”€โ”€ can't convert string to record
   โ•ฐโ”€โ”€โ”€โ”€

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-doc foo --model text-embedding-3-small
Embedding batch 1/1 
Error:   ร— Could not locate 'id' field in docs, if not called 'id' specify using --id-column

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content | vector enrich-text foo --model text-embedding-3-small
Error:   ร— Could not parse list of files

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> vector enrich-doc -h
Enriches given JSON with embeddings of selected field

Usage:
  > vector enrich-doc {flags} <field> 

Flags:
  -h, --help - Display the help message for this command
  --model <String> - the model to generate the embeddings with
  --dimension <Int> - dimension of the resulting embeddings
  --maxTokens <Int> - the token per minute limit for the provider/model
  --id-column <String> - the name of the id column if used with an input stream
  --vectorField <String> - the name of the field into which the embedding is written, defaults to fieldVector

Parameters:
  field <string>: the field from which the vector is generated

Examples:
  Open local json doc and enrich the field named 'description'
  > open ./local.json | vector enrich-doc description --model amazon.titan-embed-text-v2:0

  Fetch a single doc with id '12345' and enrich the field named 'description'
  > doc get 12345 | select content | vector enrich-doc description --model models/text-embedding-004

  Fetch and enrich all landmark documents from travel sample and upload the results to couchabase
  > query  'SELECT * FROM `travel-sample` WHERE type = "landmark"' | select content | vector enrich-doc content --model amazon.titan-embed-text-v1 | doc upsert

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content id | vector enrich-doc foo --model text-embedding-3-small
Embedding batch 1/1 
Error:   ร— Could not locate 'id' field in docs, if not called 'id' specify using --id-column

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo
โ•ญโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ # โ”‚ id  โ”‚    content    โ”‚         cas         โ”‚ error โ”‚ cluster โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0 โ”‚ foo โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ 1723616346915012608 โ”‚       โ”‚ dino    โ”‚
โ”‚   โ”‚     โ”‚ โ”‚ foo โ”‚ bar โ”‚ โ”‚                     โ”‚       โ”‚         โ”‚
โ”‚   โ”‚     โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚                     โ”‚       โ”‚         โ”‚
โ•ฐโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content id 
โ•ญโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ # โ”‚    content    โ”‚ id  โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0 โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ foo โ”‚
โ”‚   โ”‚ โ”‚ foo โ”‚ bar โ”‚ โ”‚     โ”‚
โ”‚   โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚     โ”‚
โ•ฐโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ•ฏ
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content id | vector enrich-doc 
Error: nu::parser::missing_positional

  ร— Missing required positional argument.
   โ•ญโ”€[entry #133:1:52]
 1 โ”‚ doc get foo | select content id | vector enrich-doc 
   ยท                                                    โ–ฒ
   ยท                                                    โ•ฐโ”€โ”€ missing field
   โ•ฐโ”€โ”€โ”€โ”€
  help: Usage: vector enrich-doc {flags} <field> . Use `--help` for more information.

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content id | vector enrich-doc foo
Error:   ร— no embed model provided
   โ•ญโ”€[entry #134:1:35]
 1 โ”‚ doc get foo | select content id | vector enrich-doc foo
   ยท                                   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   ยท                                           โ•ฐโ”€โ”€ 
   โ•ฐโ”€โ”€โ”€โ”€
  help: supply the embed_model in the config file or using the --model flag

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select content id | vector enrich-doc foo --model text-embedding-3-small
Embedding batch 1/1 
Error:   ร— Could not locate 'id' field in docs, if not called 'id' specify using --id-column

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | select id content | vector enrich-doc foo --model text-embedding-3-small
Error:   ร— Could not parse input from query

Error: nu::shell::cant_convert

  ร— Can't convert to record.
   โ•ญโ”€[entry #136:1:1]
 1 โ”‚ doc get foo | select id content | vector enrich-doc foo --model text-embedding-3-small
   ยท โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   ยท    โ•ฐโ”€โ”€ can't convert string to record
   โ•ฐโ”€โ”€โ”€โ”€

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | range 0..0 | vector enrich-doc foo --model text-embedding-3-small
Error:   ร— Could not parse input from query

Error: nu::shell::cant_convert

  ร— Can't convert to record.
   โ•ญโ”€[entry #137:1:1]
 1 โ”‚ doc get foo | range 0..0 | vector enrich-doc foo --model text-embedding-3-small
   ยท โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   ยท    โ•ฐโ”€โ”€ can't convert string to record
   โ•ฐโ”€โ”€โ”€โ”€

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | to json | vector enrich-doc foo --model text-embedding-3-small
Error:   ร— Piped input must a json doc or a list of json docs

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
> doc get foo | vector enrich-doc foo --model text-embedding-3-small
Error:   ร— Could not parse input from query

Error: nu::shell::cant_convert

  ร— Can't convert to record.
   โ•ญโ”€[entry #139:1:1]
 1 โ”‚ doc get foo | vector enrich-doc foo --model text-embedding-3-small
   ยท โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€
   ยท    โ•ฐโ”€โ”€ can't convert string to record
   โ•ฐโ”€โ”€โ”€โ”€

๐Ÿ‘ค Administrator ๐Ÿ  dino in ๐Ÿ—„ jiraexp.issues.users
vishaldhiman22 commented 4 weeks ago

My observation has been similar as well. I think "vector enrich-doc" should be flexible to support 2 types of input:

Type 1: Support flattened list as input. See example below:

image

Type 2: Support Key, Value as input. See example below:

image