apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.78k stars 1.74k forks source link

Common JSON convert/parse use Chinese(UTF-8) Parsing failed #7096

Open ZhangWeike2000 opened 2 months ago

ZhangWeike2000 commented 2 months ago

Search before asking

What happened

The error encountering with SeaTunnel involves an issue with JSON parsing, specifically related to UTF-8 encoding. The error message Invalid UTF-8 middle byte 0xe3 indicates that the JSON parser is encountering byte sequences that are not valid UTF-8

SeaTunnel Version

2.3.5

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
    Http {
        result_table_name = "fake"
        url = "https://api.oioweb.cn/api/common/teladress?mobile=10086"
        method = "GET"
        format = "json"
        schema = {
                   fields {
                       code = int
                       result = string
                       msg = string
                   }
       }
    }
}

sink {
      Console {
             source_table_name = "fake"
      }
}

Running Command

./bin/seatunnel.cmd --config ./config/test.template -e local

Error Exception

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
        at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
        at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-02], ErrorDescription:[Common JSON convert/parse '{"code":400,"result":null,"msg":"mobile的长度必须是11"}' operation failed.]
        at org.apache.seatunnel.common.exception.CommonError.jsonOperationError(CommonError.java:181)
        at org.apache.seatunnel.format.json.JsonDeserializationSchema.convertBytes(JsonDeserializationSchema.java:159)
        at org.apache.seatunnel.format.json.JsonDeserializationSchema.collect(JsonDeserializationSchema.java:117)
        at org.apache.seatunnel.connectors.seatunnel.http.source.DeserializationCollector.collect(DeserializationCollector.java:36)
        at org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.collect(HttpSourceReader.java:211)
        at org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.pollAndCollectData(HttpSourceReader.java:130)
        at org.apache.seatunnel.connectors.seatunnel.http.source.HttpSourceReader.internalPollNext(HttpSourceReader.java:173)
        at org.apache.seatunnel.connectors.seatunnel.common.source.AbstractSingleSplitReader.pollNext(AbstractSingleSplitReader.java:39)
        at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:156)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:116)
        at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:121)
        at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
        at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.seatunnel.shade.com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xb5
 at [Source: (byte[])"{"code":400,"result":null,"msg":"mobile??????????11"}"; line: 1, column: 41]
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3648)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:3644)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2581)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2507)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:334)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._deserializeContainerNoRecursion(JsonNodeDeserializer.java:473)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:84)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:20)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)  
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4716)
        at org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3090)
        at org.apache.seatunnel.format.json.JsonDeserializationSchema.convertBytes(JsonDeserializationSchema.java:154)
        ... 17 more

        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
        ... 2 more
2024-07-02 22:57:59,870 INFO  org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - run shutdown hook because get close signal

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

jdk1.8 & windows 11

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

ZhangWeike2000 commented 2 months ago

In my verification, it seems that this only happens in the case of Windows.

wuchunfu commented 1 month ago

In my verification, it seems that this only happens in the case of Windows.

@ZhangWeike2000 I have tested in both Linux and Windows environments and have not found this issue