MacPaw / OpenAI

Swift community driven package for OpenAI public API
MIT License
2.02k stars 333 forks source link

Example of including an image + prompt to gpt-4o? #216

Open tigran-iii opened 3 months ago

tigran-iii commented 3 months ago

Hi.

Pretty new to both Swift and this package.

Can anyone include an example of how to supply an image from assets or uploaded from the device with a prompt to the gpt-4o endpoint?

My current progress is below, but I feel like I'm doing something terribly wrong.

Only requirement is making it work at this point :D.

Current Error: image

Current Code:

private func sendDataToAPI() async {
        let washRoutine = UserDefaults.standard.string(forKey: "washRoutine") ?? ""
        let beforeBedRoutine = UserDefaults.standard.string(forKey: "beforeBedRoutine") ?? ""
        let otherHairCare = UserDefaults.standard.string(forKey: "otherHairCare") ?? ""

        var messages: [ChatQuery.ChatCompletionMessageParam] = [
            .user(.init(content: .string("Here are my current hair care routines:"))),
            .user(.init(content: .string("Wash Routine: \(washRoutine)"))),
            .user(.init(content: .string("Before Bed Routine: \(beforeBedRoutine)"))),
            .user(.init(content: .string("Other Hair Care: \(otherHairCare)")))
        ]

        if let image = UIImage(named: "curly_1"),
           let imageData = image.jpegData(compressionQuality: 1.0) {
            let base64String = imageData.base64EncodedString()
            let imageUrl = "data:image/jpeg;base64,\(base64String)"
            let imageParam = ChatQuery.ChatCompletionMessageParam.ChatCompletionUserMessageParam.init(content: .string(imageUrl))
            messages.append(.user(imageParam))
        }

        let openAI = OpenAI(apiToken: "<api_token>")
        let query = ChatQuery(messages: messages, model: .gpt4_o)

        do {
            let result = try await openAI.chats(query: query)
            let content = result.choices.first?.message.content?.string
            let tokenCount = result.usage?.promptTokens ?? 0

            DispatchQueue.main.async {
                self.apiResult = "\nPrompt tokens: \(tokenCount)\n\n\(content ?? "No content")"
            }

        } catch {
            DispatchQueue.main.async {
                self.apiResult = "Error fetching chats: \(error.localizedDescription)"
            }
        }
    }
ddaddy commented 3 months ago

I got this working. You need to use the .vision message param and not .string.

guard let imageData = image.jpegData(compressionQuality: 1.0) else { return }

let imgParam = ChatQuery.ChatCompletionMessageParam.ChatCompletionUserMessageParam(content: 
        .vision([
            .chatCompletionContentPartImageParam(.init(imageUrl: .init(url: imageData, detail: .high)))
        ])
)

let query = ChatQuery(messages: [
    .system(.init(content: system)),
    .user(imgParam),
    .user(.init(content: .string(prompt)))
],
                      model: .gpt4_o,
                      maxTokens: 500)