Open def- opened 2 months ago
@def- unfortunately I don't see many details in the services.log besides the fact that the source started. Since u193
remains in the starting
phase it likely was snapshotting the upstream mytable10
and didn't finish... I would be able to get more insight if you could run this with mysql debug logs (like we do here:
https://github.com/MaterializeInc/materialize/blob/8cc6cf10478834d478da996fd0650f2abd991b20/test/mysql-cdc/mzcompose.py#L45
) but without that it's difficult to know why it never entered the running
phase
Here is a new run where mytable5 hung: services.log.zip
Okay it looks like the table had no rows in it and the snapshot produced 0 values:
parallel-workload-materialized-1 | cluster-u3-replica-u3-gen-0: 2024-07-09T15:54:09.672615Z TRACE mz_storage::source::mysql::snapshot: timely-0 starting transaction with consistent snapshot at: {([00000000-0000-0000-0000-000000000000, fc2abc16-3e0a-11ef-ad59-0242ac180006], Absent), ([fc2abc16-3e0a-11ef-ad59-0242ac180007, fc2abc16-3e0a-11ef-ad59-0242ac180007], 18), ([fc2abc16-3e0a-11ef-ad59-0242ac180008, ffffffff-ffff-ffff-ffff-ffffffffffff], Absent)} id=u107
parallel-workload-materialized-1 | cluster-u3-replica-u3-gen-0: 2024-07-09T15:54:09.672923Z TRACE mz_storage::source::mysql::snapshot: timely-0 started transaction id=u107
parallel-workload-materialized-1 | cluster-u3-replica-u3-gen-0: 2024-07-09T15:54:09.675315Z TRACE mz_storage::source::mysql::snapshot: timely-0 reading snapshot from table '`mysql`.`mytable5`':
parallel-workload-materialized-1 | MySqlTableDesc { schema_name: "mysql", name: "mytable5", columns: [MySqlColumnDesc { name: "key0", column_type: Some(ColumnType { scalar_type: Int16, nullable: false }), meta: None }, MySqlColumnDesc { name: "key1", column_type: Some(ColumnType { scalar_type: Float32, nullable: false }), meta: None }, MySqlColumnDesc { name: "key2", column_type: Some(ColumnType { scalar_type: Int64, nullable: false }), meta: None }, MySqlColumnDesc { name: "key3", column_type: Some(ColumnType { scalar_type: Float64, nullable: false }), meta: None }, MySqlColumnDesc { name: "key4", column_type: Some(ColumnType { scalar_type: Int32, nullable: false }), meta: None }, MySqlColumnDesc { name: "key5", column_type: Some(ColumnType { scalar_type: Int16, nullable: false }), meta: None }, MySqlColumnDesc { name: "key6", column_type: Some(ColumnType { scalar_type: Int32, nullable: false }), meta: None }, MySqlColumnDesc { name: "value0", column_type: Some(ColumnType { scalar_type: Int64, nullable: true }), meta: None }, MySqlColumnDesc { name: "value1", column_type: Some(ColumnType { scalar_type: Float32, nullable: true }), meta: None }, MySqlColumnDesc { name: "value2", column_type: Some(ColumnType { scalar_type: Timestamp { precision: Some(TimestampPrecision(0)) }, nullable: true }), meta: None }, MySqlColumnDesc { name: "value3", column_type: Some(ColumnType { scalar_type: Timestamp { precision: Some(TimestampPrecision(0)) }, nullable: true }), meta: None }, MySqlColumnDesc { name: "value4", column_type: Some(ColumnType { scalar_type: Float32, nullable: true }), meta: None }, MySqlColumnDesc { name: "value5", column_type: Some(ColumnType { scalar_type: Time, nullable: true }), meta: None }, MySqlColumnDesc { name: "value6", column_type: Some(ColumnType { scalar_type: String, nullable: true }), meta: None }, MySqlColumnDesc { name: "value7", column_type: Some(ColumnType { scalar_type: Timestamp { precision: Some(TimestampPrecision(0)) }, nullable: true }), meta: None }, MySqlColumnDesc { name: "value8", column_type: Some(ColumnType { scalar_type: Int64, nullable: true }), meta: None }, MySqlColumnDesc { name: "value9", column_type: Some(ColumnType { scalar_type: String, nullable: true }), meta: None }, MySqlColumnDesc { name: "value10", column_type: Some(ColumnType { scalar_type: Int32, nullable: true }), meta: None }, MySqlColumnDesc { name: "value11", column_type: Some(ColumnType { scalar_type: Float64, nullable: true }), meta: None }, MySqlColumnDesc { name: "value12", column_type: Some(ColumnType { scalar_type: Time, nullable: true }), meta: None }, MySqlColumnDesc { name: "value13", column_type: Some(ColumnType { scalar_type: Float32, nullable: true }), meta: None }, MySqlColumnDesc { name: "value14", column_type: Some(ColumnType { scalar_type: Timestamp { precision: Some(TimestampPrecision(0)) }, nullable: true }), meta: None }], keys: {} } id=u107
parallel-workload-materialized-1 | cluster-u3-replica-u3-gen-0: 2024-07-09T15:54:09.675590Z TRACE mz_storage::source::mysql::snapshot: timely-0 snapshotted 0 records from table '`mysql`.`mytable5`' id=u107
I wonder if this is always the case (a subsource remains in starting
phase) until at least one row has been produced for the relevant table?
This currently makes it impossible to use MySQL sources in parallel-workload since they seem to always be in a stuck state, so I disabled them.
Do you know if the tables are supposed to have any data in them?
Initially they are empty in these tests, but then pretty quickly filled.
Okay, I'm not sure that the starting status is actually an issue with the MySQL source. In the code all source types will not get put into running
status until at least one row has been processed:
https://github.com/MaterializeInc/materialize/blob/d8f578ed9e3cdaf40e32c13fd511feba2e546575/src/storage/src/source/source_reader_pipeline.rs#L316-L318
So I think this issue is somehow related to the parallel workload test. For some reason the mysql tables aren't getting data put into them, such that the subsources never receive rows and move out of 'starting'. I would expect the same to happen for postgres sources but maybe there is some sort of race condition in the test that is triggered differently on mysql vs postgres (since it's flaky)?
For what it's worth I don't care much about whether the status is running or starting. The important part to me is that all queries using the source hang:
a JOIN hangs indefinitely, but each of the individual objects is selectable
The timestamps of the mysql source never advance.
I don't have a simpler reproducer, but parallel-workload runs into it all the time: bin/mzcompose --find parallel-workload down && bin/mzcompose --find parallel-workload run default
diff --git a/misc/python/materialize/parallel_workload/action.py b/misc/python/materialize/parallel_workload/action.py
index fb37f977c2..19376dcc11 100644
--- a/misc/python/materialize/parallel_workload/action.py
+++ b/misc/python/materialize/parallel_workload/action.py
@@ -2051,9 +2051,8 @@ ddl_action_list = ActionList(
(DropKafkaSinkAction, 4),
(CreateKafkaSourceAction, 4),
(DropKafkaSourceAction, 4),
- # TODO: Reenable when #28108 is fixed
- # (CreateMySqlSourceAction, 4),
- # (DropMySqlSourceAction, 4),
+ (CreateMySqlSourceAction, 4),
+ (DropMySqlSourceAction, 4),
(CreatePostgresSourceAction, 4),
(DropPostgresSourceAction, 4),
(GrantPrivilegesAction, 4),
What version of Materialize are you using?
463981199d1edea80820914543fadf73df727903
What is the issue?
I have an example locally of parallel-workload getting stuck (
bin/mzcompose --find parallel-workload down && bin/mzcompose --find parallel-workload run default
), where a JOIN hangs indefinitely, but each of the individual objects is selectable:All queries: parallel-workload-queries.log The definitions of mytable10 and v-16:
services.log: services.log.zip
Somehow the frontiers of the sources look off by a lot and mytable10 never gets new frontiers. The MySQL subsources seem to somehow be stuck in starting:
@rjobanp Any idea what is happening here? I'll keep it running for a bit if you have an idea for what to check/do.
(I found this while looking into https://github.com/MaterializeInc/materialize/issues/23582, but we already had stuck queries before even adding the MySQL source, so this is just an additional issue.)