Closed rhy429081492 closed 1 year ago
Hi @rhy429081492, have you validated data captured by Wireshack? Taking the first error log as an example, did you see si\tslot1201\n201\nsi\tslot1202
in Wireshack? Is that same as in the input.get(i)
? If you're certain that you passed valid data to the JDBC driver, then it could be a bug.
Lastly, please pay attention that you're using an obsoleted driver which is no longer supported. I'd suggest you upgrade to a more recent version like 0.4.6.
Hi @rhy429081492, have you validated data captured by Wireshack? Taking the first error log as an example, did you see
si\tslot1201\n201\nsi\tslot1202
in Wireshack? Is that same as in theinput.get(i)
? If you're certain that you passed valid data to the JDBC driver, then it could be a bug.Lastly, please pay attention that you're using an obsoleted driver which is no longer supported. I'd suggest you upgrade to a more recent version like 0.4.6. I confirm that the packets captured by Wireshark contain this information. The data is valid. This problem occurs only when the data volume increases. Initially, my main purpose is to solve the problem that the latency increases due to excessive data after the in statement. In the latest version of the JDBC client, the interface that I use is no longer available. Use the new interface to solve this problem by creating temporary tables. However, the way temporary tables are created is not ideal for performance optimization. Does not perform as well as the interface I raised the issue with. When creating a temporary table, the new interface communicates with the server for multiple times. The packet of the new interface is completely different from that of the interface used by me. I'd rather do something similar to this in clickhouse docs. https://clickhouse.com/docs/en/engines/table-engines/special/external-data
Use the new interface to solve this problem by creating temporary tables. However, the way temporary tables are created is not ideal for performance optimization. Does not perform as well as the interface I raised the issue with.
Are you referring to the following example? The name might indeed be somewhat confusing. Internally, it utilizes an external table within the same http request. Moreover, you can invoke the ClickHouseExternalData.asTempTable()
function if you prefer to employ a temporary table instead of an external table. This can be particularly useful for handling complex data types, as mentioned in ClickHouse/ClickHouse#32171.
In conclusion, if your application demands high performance, I strongly recommend upgrading to 0.3.2-patch11(use com.clickhouse.jdbc.ClickHouseDriver
since both old and new drivers co-exist in 0.3.2). This version offers a significant speed increase compared to its predecessors, as detailed in #768. Additionally, version 0.4 doubles the performance of nested type serialization. Furthermore, you might consider utilizing the Java client, which provides a lower-level API with minimal overhead, typically resulting in superior performance.
Describe the bug
I occasionally report an error when I use this function in JDBC to add external data in batches. Each error is caused by the format of the data sample. The sample position is different each time the error is reported. When I reduce the amount of data, it works. I have tried to use Wireshark to capture packets. It is found that the request data packets received by the server when the request fails are the same as those received by the server when the request succeeds.
Code example
try (PreparedStatement statement = prepareStatement(dataSource, sql)) { List useData = new ArrayList<>();
for (int i = 0; i < externalData.size(); i++) {
ClickHouseExternalData tmpData = externalData.get(i);
useData.add(new ClickHouseExternalData(tmpData.getName(),
new ByteArrayInputStream(input.get(i).getBytes(StandardCharsets.UTF_8)))
.withStructure(tmpData.getStructure()).withFormat(tmpData.getFormat()));
}
ClickHousePreparedStatement clickHousePreparedStatement = (ClickHousePreparedStatement) statement;
try (ResultSet rs = clickHousePreparedStatement.executeQuery(null, useData)){
LOGGER.debug("ExecuteShardQuery: {},Use DataSource Url: [{}]", exeMethod,
dataSource.getAllClickhouseUrls());
return callBack.call(rs);
}
}
Error log
First error log 2023.05.18 10:19:55.190332 [ 27874 ] {} DynamicQueryHandler: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected '\t' before: '\nsi\tslot1202\nsi\tslot1203\nsi\tslot1204\nsi\tslot1205\nsi\tslot1206\nsi\tslot1207\nsi\tslot1208\nsi\tslot1209\nsi\tslot1210\nsi\tslot1211\nsi\tslot1212\nsi\tslot1213\nsi\tslot1214\nsi\t':
Row 1203:
Column 0, name: tagname, type: String, parsed text: "si"
Column 1, name: tagvalue, type: String, parsed text: "slot1201"
Row 1204: Column 0, name: tagname, type: String, parsed text: "201" ERROR: Line feed found where tab is expected. It's like your file has less columns than expected. And if your file have right number of columns, maybe it have unescaped backslash in value before tab, which cause tab has escaped.
: While executing SourceFromInputStream: (at row 1204) , Stack trace (when copying this message, always include the lines below):
Second error log 2023.05.18 10:26:41.058246 [ 2668 ] {} void DB::ParallelParsingInputFormat::onBackgroundException(size_t): Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected ',' before: '\nsi,slot1706\nsi,slot1707\nsi,slot1708\nsi,slot1709\nsi,slot1710\nsi,slot1711\nsi,slot1712\nsi,slot1713\nsi,slot1714\nsi,slot1715\nsi,slot1716\nsi,slot1717\nsi,slot1718\nsi,':
Row 1707:
Column 0, name: tagname, type: String, parsed text: "si"
Column 1, name: tagvalue, type: String, parsed text: "slot1705"
Row 1708: Column 0, name: tagname, type: String, parsed text: "5" ERROR: Line feed found where delimiter (,) is expected. It's like your file has less columns than expected. And if your file have right number of columns, maybe it have unescaped quotes in values.
: (at row 1708) , Stack trace (when copying this message, always include the lines below):
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved: