Closed shohhei1126 closed 6 years ago
@shohhei1126 - You said "When restoring with the recently released google cloud spanner backup feature". Are you referring to https://cloud.google.com/spanner/docs/import OR are you referring to the code here in this project which is not the official Cloud Spanner Backup & Restore?
@EricBeach I'm sorry I didn't make it clear enough. I'm referring to https://cloud.google.com/spanner/docs/import and GCP support told me to make issue at this project.
I tried restoring same schema and data with this project https://github.com/GoogleCloudPlatform/pontem/blob/dev/USAGE.md#restore and same error occurred.
@shohhei1126 - Thanks for the note. Since you're using the official in-product Import/Export, the official channel of support is not here and you should go back to the official support team. Sorry for the confusion.
Just to clarify, com.google.cloud.teleport.spanner
is the official in-product tool and com.google.cloud.pontem
is an open-source means to perform backup-restore that can be forked, etc.
Since you're hitting this issue with Pontem as well, I will take a look at the issue when running Pontem. Since the issue is happening with the native Import/Export and here with Pontem, coupled with looking at the stack trace, I suspect the issue is down stack with the Apache Beam library -- but that is just an initial guess.
@EricBeach Thanks for the information. There were differences when read logs carefully.
This is the error when executed with pontem.
Caused by: java.io.IOException: varint overflow -62135596800
at org.apache.beam.sdk.util.VarInt.decodeInt(VarInt.java:65)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodePrimitive(MutationGroupEncoder.java:453)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodeModification(MutationGroupEncoder.java:326)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodeMutation(MutationGroupEncoder.java:280)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decode(MutationGroupEncoder.java:264)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$BatchFn.processElement(SpannerIO.java:1030)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$BatchFn$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:185)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:146)
at com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:181)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:102)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowViaIteratorsFn.processElement(BatchGroupAlsoByWindowViaIteratorsFn.java:124)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowViaIteratorsFn.processElement(BatchGroupAlsoByWindowViaIteratorsFn.java:53)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:113)
at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:200)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
@shohhei1126 - Thanks for the information. The package names are slightly different, but the core part of the code causing the failure is the same (just a different package name).
Pontem
Caused by: java.io.IOException: varint overflow -62135596800
at org.apache.beam.sdk.util.VarInt.decodeInt(VarInt.java:65)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodePrimitive(MutationGroupEncoder.java:453)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodeModification(MutationGroupEncoder.java:326)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decodeMutation(MutationGroupEncoder.java:280)
at org.apache.beam.sdk.io.gcp.spanner.MutationGroupEncoder.decode(MutationGroupEncoder.java:264)
Native Cloud Spanner
Caused by: java.io.IOException: varint overflow -62135596800
at org.apache.beam.sdk.util.VarInt.decodeInt(VarInt.java:65)
at com.google.cloud.teleport.spanner.connector.spanner.MutationGroupEncoder.decodePrimitive(MutationGroupEncoder.java:454)
at com.google.cloud.teleport.spanner.connector.spanner.MutationGroupEncoder.decodeModification(MutationGroupEncoder.java:327)
at com.google.cloud.teleport.spanner.connector.spanner.MutationGroupEncoder.decodeMutation(MutationGroupEncoder.java:281)
at com.google.cloud.teleport.spanner.connector.spanner.MutationGroupEncoder.decode(MutationGroupEncoder.java:265)
@shohhei1126 - The crux of the issue appears to be that 0001-01-01T00:00:00Z, which is a valid Timestamp per https://cloud.google.com/spanner/docs/data-types#timestamp-type, is too large for an integer. See the two lines of code below. I can take a look at filing a bug or trying to submit a fix to the library.
I filed https://issues.apache.org/jira/browse/BEAM-4862
In addition, the code change below should fix the issue and the rest of the tests pass ($ ./gradlew beam-sdks-java-io-google-cloud-platform:test
).
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationGroupEncoder.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationGroupEncoder.java
@@ -478,7 +478,7 @@ class MutationGroupEncoder {
if (isNull) {
m.set(fieldName).to((Timestamp) null);
} else {
- int seconds = VarInt.decodeInt(bis);
+ long seconds = VarInt.decodeLong(bis);
int nanoseconds = VarInt.decodeInt(bis);
m.set(fieldName).to(Timestamp.ofTimeSecondsAndNanos(seconds, nanoseconds));
}
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationGroupEncoderTest.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/MutationGroupEncoderTest.java
@@ -528,6 +528,32 @@ public class MutationGroupEncoderTest {
verifyEncodedOrdering(schema, "test", keys);
}
+ @Test
+ public void decodeTimestampMutationGroup() {
+ SpannerSchema spannerSchemaTimestamp = SpannerSchema.builder()
+ .addColumn("timestampTest", "timestamp", "TIMESTAMP")
+ .build();
+ Timestamp timestamp1 = Timestamp.now();
+ Mutation mutation1 = Mutation.newInsertOrUpdateBuilder("timestampTest")
+ .set("timestamp").to(timestamp1).build();
+ encodeAndVerify(g(mutation1), spannerSchemaTimestamp);
+
+ Timestamp timestamp2 = Timestamp.parseTimestamp("2001-01-01T00:00:00Z");
+ Mutation mutation2 = Mutation.newInsertOrUpdateBuilder("timestampTest")
+ .set("timestamp").to(timestamp2).build();
+ encodeAndVerify(g(mutation2), spannerSchemaTimestamp);
+
+ Timestamp timestamp3 = Timestamp.MIN_VALUE;
+ Mutation mutation3 = Mutation.newInsertOrUpdateBuilder("timestampTest")
+ .set("timestamp").to(timestamp3).build();
+ encodeAndVerify(g(mutation3), spannerSchemaTimestamp);
+
+ Timestamp timestamp4 = Timestamp.MAX_VALUE;
+ Mutation mutation4 = Mutation.newInsertOrUpdateBuilder("timestampTest")
+ .set("timestamp").to(timestamp4).build();
+ encodeAndVerify(g(mutation4), spannerSchemaTimestamp);
+ }
+
@shohhei1126 - The Cloud Spanner Import via the Google Cloud Console UI should now be fixed. I am submitting a patch to Apache Beam library https://github.com/apache/beam/pull/6077 to fix the issue in the underlying Apache Beam code.
Fix https://github.com/apache/beam/pull/6077 is now submitted. The Cloud Spanner team has also made the corresponding fix and the official UI-based Import/Export on Cloud Spanner should work now too.
@EricBeach The import was successful. Thank you for fixing it!
When restoring with the recently released google cloud spanner backup feature, but an error occurred. It seems that an error occurs if a zero value(0001-01-01T00:00:00Z) is included in the timestamp type field. It succeeded if null was included.
schema
data
error