HamaWhiteGG / langchain-java

Java version of LangChain, while empowering LLM for Big Data.
Apache License 2.0
545 stars 106 forks source link

[BUG] Incorrect Type for logprobs Field in CompletionResp Class #144

Closed itneedme closed 9 months ago

itneedme commented 9 months ago

Describe the bug When I request the /v1/completions api, i got a exception which says the JSON response seems to have logprobs as an object (START_OBJECT token), but the Java code is expecting an integer value (Cannot deserialize instance of java.lang.Integer``).

Log and Stack trace exception details :

2023-12-05 19:05:51:263 [10.0.4.66] [DEBUG] [OkHttp http://127.0.0.1:3000/...] com.hw.openai.OpenAiClient - <-- END HTTP (2729-byte body)
2023-12-05 19:06:22:268 [10.0.4.66] [ERROR] [http-nio-19004-exec-8] c.h.j.w.v.c.aigc.LlmController - asyncGenerateSse failed, maskQuery: MaskQuery(query=????, id=hi), cacheStrategy: DEFAULT
reactor.core.Exceptions$ReactiveException: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.Integer` out of START_OBJECT token
 at [Source: (String)"{"id":"cmpl-KcdyIMdQk78jf5iIAHFkHHwQhs0xT","object":"text_completion","created":1701774349,"model":"gpt-3.5-turbo","choices":[{"text":"1","index":0,"finish_reason":"","logprobs":{"tokens":null,"token_logprobs":null,"top_logprobs":null,"text_offset":null}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}"; line: 1, column: 179] (through reference chain: com.hw.openai.entity.completions.CompletionChunk["choices"]->java.util.ArrayList[0]->com.hw.openai.entity.completions.Choice["logprobs"])
    at reactor.core.Exceptions.propagate(Exceptions.java:366)
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:97)
    at reactor.core.publisher.Flux.blockLast(Flux.java:2483)

To Reproduce just request the /v1/completions openai api.

Expected behavior expeted bahavior is none exception with a success response.

Please complete the following information:

Additional context this openai api reference snapshot. screenshots

HamaWhiteGG commented 9 months ago

Thank you for the detailed description. I just submitted the code to fix it.

Add Class LogProbs:

@Data
public class LogProbs {

    List<String> tokens;

    @JsonProperty("token_logprobs")
    List<Double> tokenLogprobs;

    @JsonProperty("top_logprobs")
    List<Map<String, Double>> topLogprobs;

    @JsonProperty("text_offset")
    List<Integer> textOffset;
}
itneedme commented 9 months ago

Thank you for your efforts in updating the code. However, I noticed that the recent changes might still not align with the OpenAI official documentation. According to the OpenAI API documentation, the logprobs field is an integer in the request object but an object in the response.

In your latest commit, the LogProbs class seems to be designed with several lists, which may not fully comply with the OpenAI documentation's definition for the request object. I recommend reviewing the OpenAI API documentation again to ensure that the implementation of the logprobs field in the response matches the official documentation. This will help avoid potential serialization or deserialization issues in the future.

Thank you for your contribution to the project, and I hope this suggestion is helpful.

HamaWhiteGG commented 9 months ago

The logprobs in com.hw.openai.entity.completions.Completion request is an integer.

package com.hw.openai.entity.completions;
public class Completion implements Serializable {
    private Integer logprobs;
}

The logprobs in com.hw.openai.entity.completions.Choice response is an LogProbs.

package com.hw.openai.entity.completions;

import com.fasterxml.jackson.annotation.JsonProperty;

public class Choice {

    private String text;

    private Integer index;

    private LogProbs logprobs;

    @JsonProperty("finish_reason")
    private String finishReason;
}

image Based on the openai documents, I change the array to the List in java.

@Data
public class LogProbs {

    List<String> tokens;

    @JsonProperty("token_logprobs")
    List<Double> tokenLogprobs;

    @JsonProperty("top_logprobs")
    List<Map<String, Double>> topLogprobs;

    @JsonProperty("text_offset")
    List<Integer> textOffset;
}

If the above code is incorrect or may lead to deserialization errors, could you provide me with a detailed example that includes the fields within logprobs? Thank you.

The official documentation as well as the fields within logprobs you provided earlier are all null.

"logprobs":{
    "tokens":null,
    "token_logprobs":null,
    "top_logprobs":null,
    "text_offset":null
}
itneedme commented 9 months ago

Thank you so much for your prompt and effective resolution to this issue.