Is it possible to choose the response type

atacan commented 3 weeks ago

Question

I have the following generated struct s. Even though the server returns the CreateTranscriptionResponseVerboseJson (I see it in proxyman), the response is parsed as CreateTranscriptionResponseJson. I guess because it is a subset of the verbose one and it's the one that's first tried?

Full reproduction: https://github.com/atacan/swift-openai-api/blob/c087e61ed66ea40e627b785e8146bddd44a5e8b8/Tests/OpenAIAsyncHTTPClientTests/OpenAIAsyncHTTPClientTest.swift#L90

Code snippets are below

        /// Represents a transcription response returned by model, based on the provided input.
        ///
        /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseJson`.
        public struct CreateTranscriptionResponseJson: Codable, Hashable, Sendable {
            /// The transcribed text.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseJson/text`.
            public var text: Swift.String
            /// Creates a new `CreateTranscriptionResponseJson`.
            ///
            /// - Parameters:
            ///   - text: The transcribed text.
            public init(text: Swift.String) {
                self.text = text
            }
            public enum CodingKeys: String, CodingKey {
                case text
            }
        }
        /// Represents a verbose json transcription response returned by model, based on the provided input.
        ///
        /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson`.
        public struct CreateTranscriptionResponseVerboseJson: Codable, Hashable, Sendable {
            /// The language of the input audio.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson/language`.
            public var language: Swift.String
            /// The duration of the input audio.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson/duration`.
            public var duration: Swift.String
            /// The transcribed text.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson/text`.
            public var text: Swift.String
            /// Extracted words and their corresponding timestamps.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson/words`.
            public var words: [Components.Schemas.TranscriptionWord]?
            /// Segments of the transcribed text and their corresponding details.
            ///
            /// - Remark: Generated from `#/components/schemas/CreateTranscriptionResponseVerboseJson/segments`.
            public var segments: [Components.Schemas.TranscriptionSegment]?
            /// Creates a new `CreateTranscriptionResponseVerboseJson`.
            ///
            /// - Parameters:
            ///   - language: The language of the input audio.
            ///   - duration: The duration of the input audio.
            ///   - text: The transcribed text.
            ///   - words: Extracted words and their corresponding timestamps.
            ///   - segments: Segments of the transcribed text and their corresponding details.
            public init(
                language: Swift.String,
                duration: Swift.String,
                text: Swift.String,
                words: [Components.Schemas.TranscriptionWord]? = nil,
                segments: [Components.Schemas.TranscriptionSegment]? = nil
            ) {
                self.language = language
                self.duration = duration
                self.text = text
                self.words = words
                self.segments = segments
            }
            public enum CodingKeys: String, CodingKey {
                case language
                case duration
                case text
                case words
                case segments
            }
        }

the code

// ⚠️ Even though the server returns VerboseJson, we get Json here
        switch response {
        case .ok(let ok):
            switch try ok.body.json {
            case .CreateTranscriptionResponseVerboseJson(let verbose):
                dump(verbose)
            case .CreateTranscriptionResponseJson(let json):
                dump(json)
            }
        case .undocumented(let statusCode, let undocumentedPayload):
            let buffer = try await undocumentedPayload.body?.collect(upTo: 1024 * 1035 * 2, using: .init())
            let description = String(buffer: buffer!)
            print("❌", statusCode, description)

            struct myerror: Error {}
            throw ClientError.init(operationID: "", operationInput: "", causeDescription: "", underlyingError: myerror())
        }

simonjbeaumont commented 3 weeks ago

I expect this OpenAPI document might be missing a discriminator for the anyOf. Without that, there might not be much we can do.

https://redocly.com/docs/resources/discriminator

atacan commented 3 weeks ago

Thank you. I checked it, and it uses oneOf.

EDIT:

Changing it to anyOf didn't work either. I also moved the CreateTranscriptionResponseVerboseJson one line up to be before the other one. It still decodes the response as CreateTranscriptionResponseJson

  /audio/transcriptions:
    post:
      operationId: createTranscription
      tags:
        - Audio
      summary: Transcribes audio into the input language.
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: "#/components/schemas/CreateTranscriptionRequest"
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
-                oneOf:
+                anyOf:
-                  - $ref: "#/components/schemas/CreateTranscriptionResponseJson"
                  - $ref: "#/components/schemas/CreateTranscriptionResponseVerboseJson"
+                  - $ref: "#/components/schemas/CreateTranscriptionResponseJson"

atacan commented 3 weeks ago

OK. I found the problem. A numeric property is specified as a string.

Putting the expected value before is still important, because in case of anyOf decode tries one by one, and if one type is a subset of the other the decoder will work.

public init(from decoder: any Decoder) throws {
    var errors: [any Error] = []
    do {
        value1 = try .init(from: decoder)
    } catch {
        errors.append(error)
    }
    do {
        value2 = try .init(from: decoder)
    } catch {
        errors.append(error)
    }
    try Swift.DecodingError.verifyAtLeastOneSchemaIsNotNil(
        [
            value1,
            value2
        ],
        type: Self.self,
        codingPath: decoder.codingPath,
        errors: errors
    )
}

same goes for oneOf

czechboy0 commented 3 weeks ago

The important difference is that the payload must match exactly one schema in a oneOf, and one or more in an allOf. If one is a subset of the other, then by definition it cannot be oneOf (as it can match both), and would be an anyOf.

The discriminator wouldn't make a difference here, because when parsing an anyOf, all the schemas are given a chance to decode the payload and the order doesn't change the result.

apple / swift-openapi-generator

Is it possible to choose the response type #661

Question