Open rob-pomelo opened 2 years ago
We are in a similar situation where we cannot use DB migration scripts in CD (with Spanner) vs CI/Local (with Emulator).
Hey Guys, does this issue has any new update? I am on a similar situation where i have to disable my db migrations because of it. I am almost giving up on the spanner because of this.
Hey there and thanks for the feedback!
I'm from the Spanner change streams team and wanted to share a quick update. While we are working on the actual change streams support in emulator, we are working on a fix to make change streams a no-op. This will resolve the DDL issues described here.
I'll keep this bug updated about timelines and progress.
@efalkenberg thank you for the update. Is there any timeline for the features you mentioned? Thank you in advance.
Hey @kberezin-nshl the no-op will be available in April. The full feature that allows you to capture changes and consume them via the TVF will be available in summer. Are you looking for the no-op or the full feature support?
Hi there, the no-op is available now. You can check it in the latest version of emulator.
Regarding the full feature support, will that ever run multiple change stream partitions at the same time, or will it only ever be one partition at a time? This is likely equivalent to asking if the emulator only has 1 underlying partition..
I'd ideally want to use the emulator to test the behavior of my change-stream-processing application when stream partitions split/merge.
Hi @benjaminran, we will support multiple partitions and split/merge when the full feature is GA.
@efalkenberg @ChangyuLi28 do you have any update on full change stream support?
Ping @efalkenberg @ChangyuLi28 do you have any update on full change stream support?
Ping @efalkenberg @ChangyuLi28 do you have any update on full change stream support?
Hey Ian! Sorry I missed the previous ping. We are in the process of finalizing the first version and plan to release it in the next week or two. I‘ll update the bug here when it’s out!
Hi, the first version of change stream support is available now. Supported features include NEW_VALUES value capture type, move churning, and query in GoogleSQL. We’ll continue to add more features.
I'm trying to use the changestream support in v1.5.9 with a test for a beam pipeline but I get the following error when trying to connect to a changestream:
com.google.cloud.spanner.SpannerException: UNKNOWN: io.grpc.StatusRuntimeException: UNKNOWN: Unexpected error in RPC handling - Statement: 'SELECT * FROM READ_EventStream( start_timestamp => @startTimestamp, end_timestamp => @endTimestamp, partition_token => @partitionToken, read_options => null, heartbeat_milliseconds => @heartbeatMillis)'
The changestream exists in the DDL with that name and the way the error is phrased leads me to think this isn't supported yet?
Hi Ian, we haven't add the emulator host variable in the dataflow connector yet. We'll support it in future release.
Do we have timeline update for this? We are also facing same issue.
The emulator host should now be supported in the change streams Dataflow connector in the most recent release (v1.5.11).
We will update documentation soon with this information. For now, you can run the Dataflow connector against the emulator with the following configuration:
final SpannerConfig spannerConfig = SpannerConfig.create() .withEmulatorHost(emulatorHost) .withProjectId(projectId) .withInstanceId(instanceId) .withDatabaseId(databaseId);
SpannerIO.readChangeStream() .withSpannerConfig(spannerConfig) .withChangeStreamName(changeStreamName) .withMetadataDatabase(metadataDatabaseId) .withMetadataTable(metadataTableName)
Currently, using the Dataflow connector with the emulator may cause an error to be thrown when the connector finishes execution and tries to drop the metadata table with a TTL policy. This is a known issue in the emulator. Feel free to ignore this error; we plan to fix it in a follow-up release.
@ChangyuLi28 I'm seeing odd behavior when trying to use the new change stream support in the emulator in our integration tests.
For reference, I'm using the https://github.com/cloudspannerecosystem/spanner-change-streams-tail library to connect to the change stream.
In the test, the library is successfully connecting to the change stream in the emulator and I'm seeing two Child Partition Records being returned for the registered change stream. The library is then asking for the stream of changes for each of those partitions and then... nothing. I don't see any Data Change Records coming out of the stream, even if I write a bunch of rows to the monitored table.
Are there any "gotchas" that I'm perhaps overlooking? This code works against a real Spanner in my manual tests, so I feel like I'm missing something...
@josephschorr which change stream mode are you using? Note that the initial version only supports the NEW_VALUES value capture type.
@iangudger The default, which I believe is OLD_AND_NEW_VALUES
.
However, it doesn't seem to be reporting the new rows even if I explicitly set the change stream to use NEW_VALUES
, like so: CREATE CHANGE STREAM thetable_stream FOR thetable OPTIONS ( value_capture_type = 'NEW_VALUES' )
Hi @josephschorr Could you specify how you configured the spanner-change-streams-tail such that it was connecting to the emulator endpoint?
@ChangyuLi28 Sure! I have the spanner emulator env var set to point to the emulator, and I'm giving it the configured database name. You can see how its used in code here: https://github.com/authzed/spicedb/blob/main/internal/datastore/spanner/watch.go#L49
@ChangyuLi28 Sure! I have the spanner emulator env var set to point to the emulator, and I'm giving it the configured database name. You can see how its used in code here: https://github.com/authzed/spicedb/blob/main/internal/datastore/spanner/watch.go#L49
Thank you for raising this issue! We managed to reproduce on our end and confirmed that, while the data records eventually show up, there is some delay in the record arrival in the order of seconds. We will fix this issue in the next emulator release.
@ChangyuLi28 Sure! I have the spanner emulator env var set to point to the emulator, and I'm giving it the configured database name. You can see how its used in code here: https://github.com/authzed/spicedb/blob/main/internal/datastore/spanner/watch.go#L49
Hi @josephschorr , a new Emulator version v1.5.12 was released yesterday and this high latency for Data Change Records should be fixed now in this version. Thanks again for catching this and let us know if there are any other issues you observe.
Regarding the full feature support, will that ever run multiple change stream partitions at the same time, or will it only ever be one partition at a time? This is likely equivalent to asking if the emulator only has 1 underlying partition..
I'd ideally want to use the emulator to test the behavior of my change-stream-processing application when stream partitions split/merge.
Hi @benjaminran merge / split churning is now supported in the emulator for change streams in v1.5.12!
Hi all, the OLD_AND_NEW_VALUES and NEW_ROW value capture types are supported in the emulator in v1.5.13. Now change streams is fully supported.
Hi again, I think I've found a bug with the change stream implementation. We have tables that use as
to extract fields from another column (Data
) in order to index them. When we change the Data
column, we get the correct "new" value of the change (a JSON object with a Data
field with the correct new value of the Data
column) but the "old" value is not always the correct value. Instead, it seems to be randomly picking a column (either Data
or one of the other columns that is derived from Data
).
We've verified that the old value of at least 4 different columns derived from Data
are incorrectly passed as the old value of Data
in the change. We are using OLD_AND_NEW_VALUES
as our capture type.
We consume these changes via Beam changestreams and when connected to the cloud implementation, we do not observe this behavior.
Happy to provide any more context!
Hi again, I think I've found a bug with the change stream implementation. We have tables that use
as
to extract fields from another column (Data
) in order to index them. When we change theData
column, we get the correct "new" value of the change (a JSON object with aData
field with the correct new value of theData
column) but the "old" value is not always the correct value. Instead, it seems to be randomly picking a column (eitherData
or one of the other columns that is derived fromData
).We've verified that the old value of at least 4 different columns derived from
Data
are incorrectly passed as the old value ofData
in the change. We are usingOLD_AND_NEW_VALUES
as our capture type.We consume these changes via Beam changestreams and when connected to the cloud implementation, we do not observe this behavior.
Happy to provide any more context!
Hi @ianb-pomelo , could you please provide more details on how to reproduce the issue you encountered? More specifically:
as
and the source table of column Data
). It does not need to be exact the same from your actual schema, as long as it is representative enough to reproduce the issue.Data
and results in incorrect old values. Was the update operation a simple DML/mutation or it was in some nested complex DML? Does the modification type(insert/update/delete) matter to the occurrence of this bug? Data
, was the column name field in the JSON object correct? Or both the column name field and old value field were incorrectly from another random column?Thanks!
Sure:
Sample Schema:
CREATE TABLE Users (
UserId STRING(40),
Data JSON,
OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED,
) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the Data
column with the new JSON value. We have only found this in updates thus far.
The ChangeDataRecord
we get from the Beam SDK includes the oldValuesJson
with the correct column name but incorrect value (example: oldValuesJson='{"Data":"1713378762021"}'
)
Sure:
Sample Schema:
CREATE TABLE Users ( UserId STRING(40), Data JSON, OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED, ) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the
Data
column with the new JSON value. We have only found this in updates thus far.The
ChangeDataRecord
we get from the Beam SDK includes theoldValuesJson
with the correct column name but incorrect value (example:oldValuesJson='{"Data":"1713378762021"}'
)
Thanks for providing the information! We will try to reproduce and investigate this issue on our end. I will update here when we come up with a fix.
Sure:
Sample Schema:
CREATE TABLE Users ( UserId STRING(40), Data JSON, OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED, ) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the
Data
column with the new JSON value. We have only found this in updates thus far.The
ChangeDataRecord
we get from the Beam SDK includes theoldValuesJson
with the correct column name but incorrect value (example:oldValuesJson='{"Data":"1713378762021"}'
)
Hi @ianb-pomelo , We have root caused this issue and just submitted a fix. This fix should be available in next Emulator release. Thanks for your patience!
@ShuranZhang
Sorry if I need to create another issue, but please let me share it here.
Currently, Cloud Spanner supports NEW_ROW_AND_OLD_VALUES
.
https://cloud.google.com/spanner/docs/change-streams#value-capture-type
Could you also support this new capture type in the Emulator, please? Here is a log I got when I applied DDL with the new capture type.
Invalid value_capture_type: NEW_ROW_AND_OLD_VALUES. Change Streams only support value capture types in OLD_AND_NEW_VALUES, NEW_ROW, and NEW_VALUES.
Failed to execute migration, Invalid value_capture_type: NEW_ROW_AND_OLD_VALUES. Change Streams only support value capture types in OLD_AND_NEW_VALUES, NEW_ROW, and NEW_VALUES.
https://cloud.google.com/spanner/docs/change-streams#value-capture-type
Hi @ryoheinagao Absolutely! We are actively working on supporting NEW_ROW_AND_OLD_VALUES
right now and it should be available in late Q2 or early Q3. I will update you once it is released. Thanks for your patience.
Sure: Sample Schema:
CREATE TABLE Users ( UserId STRING(40), Data JSON, OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED, ) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the
Data
column with the new JSON value. We have only found this in updates thus far. TheChangeDataRecord
we get from the Beam SDK includes theoldValuesJson
with the correct column name but incorrect value (example:oldValuesJson='{"Data":"1713378762021"}'
)Hi @ianb-pomelo , We have root caused this issue and just submitted a fix. This fix should be available in next Emulator release. Thanks for your patience!
Hi! Would it be possible to release a new version with these changes when you get the chance? thank you!
Sure: Sample Schema:
CREATE TABLE Users ( UserId STRING(40), Data JSON, OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED, ) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the
Data
column with the new JSON value. We have only found this in updates thus far. TheChangeDataRecord
we get from the Beam SDK includes theoldValuesJson
with the correct column name but incorrect value (example:oldValuesJson='{"Data":"1713378762021"}'
)Hi @ianb-pomelo , We have root caused this issue and just submitted a fix. This fix should be available in next Emulator release. Thanks for your patience!
Hi! Would it be possible to release a new version with these changes when you get the chance? thank you!
Hi @rob-pomelo , I will reach out to our release team and try to prioritize next release sooner. I will update here once the next release is available. Thanks!
Sure: Sample Schema:
CREATE TABLE Users ( UserId STRING(40), Data JSON, OtherUserId STRING(MAX) AS (JSON_VALUE(Data, "$.external_ids.other_id")) STORED, ) PRIMARY KEY(UserId);
For the write mechanism, we use the Java library to create a mutation that sets two columns and then applies them in a transaction. In this particular instance, we transactionally read the row, create the mutation and commit the transaction. The two columns we add to the mutation are the primary key (unchanged value) and the
Data
column with the new JSON value. We have only found this in updates thus far. TheChangeDataRecord
we get from the Beam SDK includes theoldValuesJson
with the correct column name but incorrect value (example:oldValuesJson='{"Data":"1713378762021"}'
)Hi @ianb-pomelo , We have root caused this issue and just submitted a fix. This fix should be available in next Emulator release. Thanks for your patience!
Hi! Would it be possible to release a new version with these changes when you get the chance? thank you!
Hi @rob-pomelo , I will reach out to our release team and try to prioritize next release sooner. I will update here once the next release is available. Thanks!
Hi @rob-pomelo
There was a new emulator release(1.5.19) published today, and it is available as the latest on gcr at https://gcr.io/cloud-spanner-emulator/emulator . 1.5.19 contains the fix for this issue and a new value capture type NEW_ROW_AND_OLD_VALUES
, thanks for your patience!
https://cloud.google.com/spanner/docs/change-streams#value-capture-type
Hi @ryoheinagao Absolutely! We are actively working on supporting
NEW_ROW_AND_OLD_VALUES
right now and it should be available in late Q2 or early Q3. I will update you once it is released. Thanks for your patience.
HI @ryoheinagao
The newest emulator release(1.5.19) is now available as the latest on gcr at https://gcr.io/cloud-spanner-emulator/emulator. NEW_ROW_AND_OLD_VALUES
is supported now in this version!
Hi! I experience some bug with such scenario
PlayerEvents
with a change streamFAILED_PRECONDITION: Cannot drop column PlayerEvents. DataJson . Column PlayerEvents. DataJson is tracked by 0 Change Stream: . Columns explicitly tracked by a Change Stream cannot be dropped. Please alter the Change Stream to stop tracking the column before dropping the column.
Emulator version is 1.5.19
3. Cannot drop column
Hi @Neon4eg , To verify that I am understanding correctly, can you confirm the above DDL scenario matching your case?
Step1. CREATE TABLE Users(
UserId INT64 NOT NULL,
Name STRING(MAX),
Age INT64,
) PRIMARY KEY (UserId)
Step2. CREATE CHANGE STREAM C FOR Users(Name)
Step3.
DROP CHANGE STREAM C
or
ALTER CHANGE STREAM C SET FOR Users(Age)
Step4.
// Drop the column here will throw the error you provided
ALTER TABLE Users DROP COLUMN Name
We will try to re-produce on our end and investigate the root cause. I will keep you posted when we come up with a fix. Thanks!
@ShuranZhang exactly. Your sample reproduces the bug. Thanks for quick response
@ShuranZhang exactly. Your sample reproduces the bug. Thanks for quick response
Hi @Neon4eg , We have root-caused the bug and submitted a fix. This fix will be included in the next Emulator release, version 1.5.23. Please note that 1.5.23 is not yet available. Once it is released, you can upgrade your Emulator to that version to resolve this issue. The current latest version is 1.5.22. Thanks for your patience!
Hi, I'm curious if there is a timeline for adding Spanner Change Stream support. Right now we need to remove the change stream DDL for our unit tests not to fail (which rely on the emulator). Thanks!