TheoKanning / openai-java

OpenAI Api Client in Java
MIT License
4.68k stars 1.16k forks source link

Analyze image with chat completion request #464

Open hynek5 opened 4 months ago

hynek5 commented 4 months ago

I'm trying to analyze image following guidelines at https://platform.openai.com/docs/guides/vision?lang=curl

I cannot make my solution work, getting weird responses like : I cannot accurately identify the contents of the image as it is encoded in base64 format. Please provide a direct image link or describe the image. which is weird as example at open api docs using python works like a charm.

@Component
public class ImageAnalyser {

    private final OpenAiService openAiService;

    @Autowired
    public ImageAnalyser(OpenAiService openAiService) {
        this.openAiService = openAiService;
    }

    public List<String> analyze(String pathToFile) {
        ChatCompletionRequest completionRequest = ChatCompletionRequest.builder()
                .model("gpt-4-vision-preview")
                .messages(List.of(getChatMessage(pathToFile)))
                .maxTokens(500)

                .build();
        System.out.println(completionRequest.toString());
        return openAiService.createChatCompletion(completionRequest)
                .getChoices().stream()
                .map(chatCompletionChoice -> chatCompletionChoice.getMessage().getContent())
                .collect(Collectors.toList());
    }

    private ChatMessage getChatMessage(String pathToImage) {
        return new ChatMessage("user",getContent(pathToImage));
    }

    private String getContent(String filePath){
        return "[" +
                "{" +
                "\"type\": \"text\"," +
                "\"text\": \"What’s in this image?\"" +
                "}," +
                "{" +
                "\"type\": \"image_url\"," +
                "\"image_url\": {" +
                "\"url\": \"data:image/jpeg;base64," + imageB64(filePath) + "\"" +
                "}" +
                "}" +
                "]";
    };

    public String imageB64(String imagePath) {
        File file = new File(imagePath);
        try (FileInputStream imageInFile = new FileInputStream(file)) {
            // Reading a file from file system
            byte imageData[] = new byte[(int) file.length()];
            imageInFile.read(imageData);

            // Converting Image byte array into Base64 String
            return Base64.getEncoder().encodeToString(imageData);
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
}

Anyone with working example or idea what could be an issue here? Or possible where to look during debug for serialization? I suspect that might be an issue but I was unable to find the right class.

Thanks a lot!

hynek5 commented 4 months ago

Okay it seems that the serialization of the content is the issue. Following is the payload :

2024-02-12T11:17:50.118+01:00  INFO 13616 --- [           main] okhttp3.OkHttpClient                     : --> POST https://api.openai.com/v1/chat/completions
2024-02-12T11:17:50.119+01:00  INFO 13616 --- [           main] okhttp3.OkHttpClient                     : Content-Type: application/json; charset=UTF-8
2024-02-12T11:17:50.119+01:00  INFO 13616 --- [           main] okhttp3.OkHttpClient                     : Content-Length: 26315
2024-02-12T11:17:50.119+01:00  INFO 13616 --- [           main] okhttp3.OkHttpClient                     : 
{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": "[{\"type\": \"text\",\"text\": \"What’s in this image?\"},\"{\"type\": \"image_url\",\"image_url\": {\"url\": \".....kZJRgABAQBQoUKhAUKFCoQ//Z\"}\"}\"]\""
    }
  ],
  "max_tokens": 500
}

It would be nice if it was possible to add json/json array to com.theokanning.openai.completion.chat.ChatMessage.content