Open Raven888888 opened 2 years ago
The exception seems come from Pulsar connector not Presto as Pulsar connector manages the Schema and cache, seems the exceptions comes from this line: https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSqlSchemaInfoProvider.java#L106 which is throwing exception when parsing schema version, can you provide schema info using pulsar-admin, so we can verify the schema info is not corrupted.
Greetings @MarvinCai
apache-pulsar/bin/pulsar-admin schemas get test
returns
{
"version": 0,
"schemaInfo": {
"name": "test",
"schema": "",
"type": "STRING",
"properties": {}
}
}
This is the same schema before the pulsar upgrade process.
Also tried to re-upload schema:
apache-pulsar/bin/pulsar-admin schemas upload test -f apache-pulsar/conf/schema_example.conf
same BufferUnderflowException
error.
Thanks in advance.
Also, ref: link
Hi @MarvinCai have you got any clue?
Update from my side:
I tried deleting its schema,
apache-pulsar/bin/pulsar-admin schemas delete test
able to query in pulsar SQL just fine.
Then, tried re-upload schema
apache-pulsar/bin/pulsar-admin schemas upload test -f apache-pulsar/conf/schema_example.conf
got the same BufferUnderflowException
error.
However, I noticed that GET
schemas now returns
{
**"version": 2,**
"schemaInfo": {
"name": "test",
"schema": "",
"type": "STRING",
"properties": {
"key1": "value1"
}
}
}
Seems like schema version has been updated from V0 to V2, from pulsar 2.7.0 to 2.8.1 respectively. Is this what causing presto pulsar connection to break? PS: Apologies, version is just related to how many times I update the topic schema. Unrelated to the issue.
Seeing some more similar issues of pulsar worker in pulsar-client-go#546 and pulsar#11457.
It seems that the schema version of the message is null, which language client do messages come from?
I have broker enabled with websocket service.
webSocketServiceEnabled=true
I am using python client WS example to publish into the topic.
sorry for delay, had some discussion with @gaoran10 offline The Websocket endpoint doesn't support schema, the internal implementation simply produce raw byte(ref) And when the Pulsar topic has schema, Pulsar SQL try to ready the message and will expect a schema, while there'll be no schema on the message, causing NPE. But when we remove the schema, Pulsar SQL will know that and just use a Byte schema by default(ref), that should explain why it's working after you remove the schema from the topic. If you try produce with a client that support passing schema, Pulsar SQL should work as expected. Actually here's a handy cli tool to ingest NYC taxi data to a Pulsar cluster with Schema as test dataset which make it easy to test Pulsar SQL: https://github.com/streamnative/examples/tree/master/nyctaxi/taxidata , the schema: https://github.com/streamnative/examples/blob/master/nyctaxi/taxidata/pkg/types/yellow.go
@MarvinCai @gaoran10 Thank you for your explanation, clears up a bit. I have further follow-up questions:
string
schema. Is there changes in schema handling from 2.7.x to 2.8.x, that causes pulsar SQL to break?string
schema, and able to query using pulsar SQL.Thanks a lot.
We add a PR https://github.com/apache/pulsar/pull/12809 to handle the null schema version problem.
Awesome, thank you @gaoran10 and team! 🚀
@gaoran10 Sorry to say but the issue I faced still persists.
I recently only have the time to try your PR #12809 . I used the official 2.10.0 binary, which should already contain your PR, and did a clean install.
However, I still face the exact same BufferUnderflowException
error (and same error log) when I
string
schema to the topic (conf/schema_example.conf
)Note that:
broker.conf
, not sure if they play a role here. Some of the paramaters are added after pulsar 2.7.0, which leads me to relate that this BufferUnderflowException
did NOT happen in pulsar 2.7.0 before...
isAllowAutoUpdateSchemaEnabled=true
systemTopicSchemaCompatibilityStrategy=ALWAYS_COMPATIBLE
topicLevelPoliciesEnabled=false
isSchemaValidationEnforced=false
schemaCompatibilityStrategy=FULL
@gaoran10 @MarvinCai
I noticed that one of the differences of broker.conf
between pulsar 2.7.0 and pulsar 2.8.0+ is the introduction of this
# The schema compatibility strategy in broker level.
# SchemaCompatibilityStrategy : ALWAYS_INCOMPATIBLE, ALWAYS_COMPATIBLE, BACKWARD, FORWARD,
# FULL, BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, FULL_TRANSITIVE
schemaCompatibilityStrategy=FULL
Could it be what causing the BufferUnderflowException
?
@Technoboy- This one should be related to https://lists.apache.org/thread/3js51tq2p3c3oldfrhprn4kcohx7h1wv ?
Any update?
@Raven888888 Which version do you use? I test with Pulsar 3.0.0, it works well, could you try to use Pulsar 3.0.0?
Before upload string schema
presto> select * from pulsar."public/default"."t1";
__value__ | __partition__ | __event_time__ | __publish_time__ | __message_id__ | __sequence_id__ | __producer_name__ | __key__ | __properties__
----------------------------------+---------------+----------------+-------------------------+----------------+-----------------+-------------------+---------+-----------------------------------
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.873 | (15,5,0) | 5 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.873 | (15,6,0) | 6 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.873 | (15,7,0) | 7 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.873 | (15,8,0) | 8 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.874 | (15,9,0) | 9 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.870 | (15,0,0) | 0 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.871 | (15,1,0) | 1 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.871 | (15,2,0) | 2 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.872 | (15,3,0) | 3 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
48 65 6c 6c 6f 20 57 6f 72 6c 64 | -1 | NULL | 2023-07-03 03:56:43.872 | (15,4,0) | 4 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
(10 rows)
Query 20230703_040424_00001_2a95d, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [10 rows, 1.17KB] [46 rows/s, 5.48KB/s]
After upload string schema
presto> select * from pulsar."public/default"."t1";
__value__ | __partition__ | __event_time__ | __publish_time__ | __message_id__ | __sequence_id__ | __producer_name__ | __key__ | __properties__
-------------+---------------+----------------+-------------------------+----------------+-----------------+-------------------+---------+-----------------------------------
Hello World | -1 | NULL | 2023-07-03 03:56:43.873 | (15,5,0) | 5 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.873 | (15,6,0) | 6 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.873 | (15,7,0) | 7 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.873 | (15,8,0) | 8 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.874 | (15,9,0) | 9 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.870 | (15,0,0) | 0 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.871 | (15,1,0) | 1 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.871 | (15,2,0) | 2 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.872 | (15,3,0) | 3 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
Hello World | -1 | NULL | 2023-07-03 03:56:43.872 | (15,4,0) | 4 | standalone-0-9 | NULL | {"key1":"value1","key2":"value2"}
(10 rows)
Query 20230703_042603_00002_2a95d, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [10 rows, 1.17KB] [41 rows/s, 4.9KB/s]
Thanks @gaoran10
I have tried through versions 2.7 (works), 2.8-2.10 (all have BufferUnderflowException). I have yet to try version 3.x, which I will in my nearest capacity.
That said, I think your test step should be:
I notice it happens when I have 2 kinds of clients producing message into the same topic, and it breaks pulsar SQL. I still able to read from the topic just fine using pulsar client or websocket client though.
Describe the bug Following this upgrade guide to upgrade pulsar cluster node by node, from
2.7.0
to2.8.1
.Presto CLI version
332
is after the upgrade.Trying to query in Presto CLI from one of the topics,
apache-pulsar/bin/pulsar sql
select * from pulsar."public/default"."test";
returnsQuery 20211006_093830_00017_59f2r failed: java.nio.BufferUnderflowException
(See below for full logs)However, this only affects some topics. Other topics in the same tenant and namespace can be queried just fine.
Also, using
pulsar flink connector
andpulsar client python api
to consume data from the problematic topics are working as expected, no issues.Full logs
PS: Saw similar issue, but that one is about byte schema data, mine is string schema data.
Suspect presto has messed up its cache somehow. Any pointer on how to identify the root cause and/or fix this issue will be greatly appreciated. Thanks!