64bit / async-openai

Rust library for OpenAI
https://docs.rs/async-openai
MIT License
1.09k stars 161 forks source link

Failing to deserialize `VectorStoreFileObject` #229

Closed hiibolt closed 2 months ago

hiibolt commented 2 months ago

Error:

2024-06-06T04:57:33.665329Z ERROR async_openai::error: failed deserialization of: {
  "id": "file-cYrRWFomydf8Ng9gvqs4zWBD",
  "object": "vector_store.file",
  "usage_bytes": 0,
  "created_at": 1717649854,
  "vector_store_id": "vs_gN12Na14YvsPhjPPxS91WX3b",
  "status": "in_progress",
  "last_error": null,
  "chunking_strategy": {
    "type": "static",
    "static": {
      "max_chunk_size_tokens": 800,
      "chunk_overlap_tokens": 400
    }
  }
}

Failing code:

client
   .files()
   .create(CreateFileRequest {
       file: FileInput::from_vec_u8("meoww.txt".into(), memory.clone().into_bytes()),
       purpose: FilePurpose::Assistants,
   }))
   .expect("Failed to upload memory as file!");

Looks like the issue is here:

/// Static Chunking Strategy
#[derive(Clone, Serialize, Debug, Deserialize, PartialEq, Default)]
pub struct StaticChunkingStrategy {
    /// The maximum number of tokens in each chunk. The default value is `800`. The minimum value is `100` and the maximum value is `4096`.
    max_chunk_size_tokens: u16,
    /// The number of tokens that overlap between chunks. The default value is `400`.
    ///
    /// Note that the overlap must not exceed half of `max_chunk_size_tokens`.
    chunk_overlap_tokens: u16,
}

I'm new to Rust, but it seems like the max_chunk_size_tokens and chunk_overlap_tokens fields may need to be pub?

64bit commented 2 months ago

Thank you for reporting the error.

Your observation that both of the fields should be pub, however, it seems that for correct "shape" of data the following is the correct representation:

pub enum VectorStoreFileObjectChunkingStrategy {
    /// This is returned when the chunking strategy is unknown. Typically, this is because the file was indexed before the `chunking_strategy` concept was introduced in the API.
    Other,
    Static{ static: StaticChunkingStrategy },
}

You're welcome to see that above works and PR is most welcome!

hiibolt commented 2 months ago

It appears that OpenAI also now requires a name when creating a vector store: Error: JSONDeserialize(Error("invalid type: null, expected a string", line: 4, column: 14))

(Reproducing code)

// Create a vector store
   let vector_store_handle = client
      .vector_stores()
      .create( CreateVectorStoreRequest {
         file_ids: None,
         name: None,
         expires_after: None,
         chunking_strategy: None,
         metadata: None
      })
      .await?;

The fix seems to be changing the following type Option<String> to String in the following snippet:

pub struct CreateVectorStoreRequest {
    ...
    #[serde(skip_serializing_if = "Option::is_none")]
    pub name: Option<String>,
    ...
}

I'll write a test covering both and open a pull request shortly, thank you for your fast help!

64bit commented 2 months ago

Thank you for th PR. The change related to vector store name and the error message is not making sense to me, can you elaborate?

64bit commented 2 months ago

Upon further investigation, it appears that there's another bug (because of inconstency in the spec).

You can actually create a vector store by just providing file_ids, in which case name will be null:

2024-06-07T21:37:30.842537Z ERROR async_openai::error: failed deserialization of: {
  "id": "vs_kyLc5xI5qptNc3Wd1JHyhmU7",
  "object": "vector_store",
  "name": null,
  "status": "in_progress",
  "usage_bytes": 0,
  "created_at": 1717796250,
  "file_counts": {
    "in_progress": 1,
    "completed": 0,
    "failed": 0,
    "cancelled": 0,
    "total": 1
  },
  "metadata": {},
  "expires_after": null,
  "expires_at": null,
  "last_active_at": 1717796250
}
Error: JSONDeserialize(Error("invalid type: null, expected a string", line: 4, column: 14))

That means the actual bug is in VectorStoreObject where name needs to be Option<String> instead of String.

So you can safely undo the name field change for CreateVectorStoreRequest in your PR. and feel free to include the changes for observation above.