Open ankit0811 opened 3 weeks ago
I see you put the option options.put(FlinkOptions.IGNORE_FAILED.key(), "false");
, it looks like there is error for parquet writers which is collected back to the coordinator, so it reports error when committing.
Hmm. I dint find any relevant errors in the tm logs.
Changed the IGNORE_KEY to true and it seems to be working but I dont see any data in the parquet files. They are all empty. Any idea how should I debug this further
Also get this exception in the same job
22:51:20.093 [pool-286-thread-1] ERROR org.apache.hudi.common.table.view.PriorityBasedFileSystemView - Got error running preferred function. Trying secondary
org.apache.hudi.exception.HoodieRemoteException: Connect to 10.3.175.136:45105 [/10.3.175.136] failed: Connection refused (Connection refused)
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:547) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:69) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getPendingCompactionOperations(PriorityBasedFileSystemView.java:257) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.table.action.clean.CleanPlanner.<init>(CleanPlanner.java:98) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:107) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:159) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:185) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:359) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:629) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:752) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:862) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:835) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.sink.CleanFunction.lambda$open$0(CleanFunction.java:71) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: org.apache.hudi.org.apache.http.conn.HttpHostConnectException: Connect to 10.3.175.136:45105 [/10.3.175.136] failed: Connection refused (Connection refused)
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:151) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:543) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
... 16 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:?]
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:?]
at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:?]
at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:?]
at java.net.Socket.connect(Unknown Source) ~[?:?]
at org.apache.hudi.org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:151) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:543) ~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
... 16 more
Changed the IGNORE_KEY to true and it seems to be working but I dont see any data in the parquet files. They are all empty. Any idea how should I debug this further
The error is catched up by the writer, and when an error occurs, it logs an error msg:
LOG.error("Error writing record " + record, t);
The error is catched up by the writer, and when an error occurs, it logs an error msg:
LOG.error("Error writing record " + record, t);
I dont think so it was logging this any where. Tried running locally as well as on our k8 flink cluster.
Was finally able to resolve this and the culprit was data type mis match in one of the column. Not sure why this was not explicitly being logged in any of the logs tho
Maybe you should check your logger configuration.
We are trying to create a COW table using kafka as our source and s3 as our sink. The source comprises of a list of kafka topics. The current checkpoint happens every 2 mins and when the checkpoint starts (hudi table files commit), the flink job throws an exception
This pipeline is the only one writing to the table so we dont have multiple writers.
Below is the config used for this
Environment Description
Hudi version : 0.14.1
Flink version : 1.15.2
Storage (HDFS/S3/GCS..) : s3
Based on some GH history, we did try to delete the
.aux/ckp_meta/
dir, but still no luck.Any pointers on how we go about fixing this would be much appreciated.