Closed steve-xi-awx closed 4 months ago
@steve-xi-awx Thanks a lot for raising this. What kind of schema change is happening? Can you post the writer configuration and big query sync configuration? I tried to add a new column, and it ran without any exception. Can you check code and let me know in case I am missing anything. You can also take this code and then try to reproduce with sample dataset.
https://gist.github.com/ad1happy2go/17b32db63f68b49813c8430967a99ec8
I have raised a mr for this issue and it seems the change can fix that issue. https://github.com/apache/hudi/pull/10830 I think this problem is caused by that the external table in BigQuery with a connection id should specify the table schema in the wrong position. Your sample did't specify the connection id so that the table is still a simple external table, not a Big Lake table. This problem is occurred in release-0.14.1.
@ad1happy2go Can you help review this mr ?
@ad1happy2go Can you help review this mr ?
I will take it.
Thanks @steve-xi-awx for the fix. Thanks @danny0405 . Tracking JIRA - https://issues.apache.org/jira/browse/HUDI-7488
To Reproduce
Steps to reproduce the behavior:
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
error when sync to BigQuery, will ignore this error and continue to process next batch, error is An error occurred while calling o2983.syncHoodieTable. : com.google.cloud.bigquery.BigQueryException: Schema can be specified only on the Table.Schema field for external tables with an associated connection_id but schema was provided on Table.Externaldataconfig.Schema. at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115) at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.patch(HttpBigQueryRpc.java:271) at com.google.cloud.bigquery.BigQueryImpl$15.call(BigQueryImpl.java:673) at com.google.cloud.bigquery.BigQueryImpl$15.call(BigQueryImpl.java:670) at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) at com.google.cloud.RetryHelper.run(RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) at com.google.cloud.bigquery.BigQueryImpl.update(BigQueryImpl.java:669) at org.apache.hudi.gcp.bigquery.HoodieBigQuerySyncClient.updateTableSchema(HoodieBigQuerySyncClient.java:206) at org.apache.hudi.gcp.bigquery.BigQuerySyncTool.syncTable(BigQuerySyncTool.java:147) at org.apache.hudi.gcp.bigquery.BigQuerySyncTool.syncHoodieTable(BigQuerySyncTool.java:111) at sun.reflect.GeneratedMethodAccessor817.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.sendCommand(ClientServerConnection.java:244) at py4j.CallbackClient.sendCommand(CallbackClient.java:384) at py4j.CallbackClient.sendCommand(CallbackClient.java:356) at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106)