[Bug] [seatunnel-connectors-v2] [connector-elasticsearch] Incorrect Encoding When Writing to StarRocks Resulting in Garbled Text

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

There is an issue with the encoding format when reading data from Elasticsearch . The root cause is that when reading data from Elasticsearch, the response header Content-Type does not include a charset encoding. SeaTunnel defaults to ISO-8859-1 encoding in the absence of a charset specification. However, StarRocks only supports UTF-8 encoding, leading to the observed garbled text. Need to adjust the encoding handling to ensure compatibility and data integrity. Open to discussion on potential solutions and improvements. I am considering submitting a pull request to address this issue.

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
  parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
}

source {
    Elasticsearch {
        hosts = ["http://127.0.0.1:10014"]
        index = "sec_evt_info"
        username = "elastic"
        password = ""
        result_table_name = "src_es"
        schema = {
         fields {
            test_data = string
            }
        }
        query = {"range":{"recordTime":{"gte":"2024-07-01 00:00:00"}}}

        tls_verify_certificate = false
    }
}

sink {
  StarRocks {
    source_table_name = "src_es"
    nodeUrls = ["127.0.0.1:9030"]
    base-url = "jdbc:mysql://127.0.0.1:8030/"
    username = root
    password = "password"
    database = "db"
    table = "table_name"
    batch_max_rows = 1000
    starrocks.config = {
      format = "CSV"
    }
  }
}

Running Command

./bin/seatunnel.sh --config job/es2starrocks.config

Error Exception

None

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

apache / seatunnel