apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.43k stars 2.42k forks source link

[SUPPORT]The k8s cluster submitted a task to write Spark streaming to Hudi, but encountered an error #10699

Open xiazhanjia opened 9 months ago

xiazhanjia commented 9 months ago

Caused by: java.net.UnknownHostException: mytask-driver-svc.dataaccess.svc: Temporary failure in name resolution at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source) at java.base/java.net.InetAddress.getAllByName0(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at org.apache.hudi.org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111) at org.apache.hudi.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:151) at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeRequestToTimelineServer(TimelineServerBasedWriteMarkers.java:173) at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.doesMarkerDirExist(TimelineServerBasedWriteMarkers.java:99) ... 57 more

I found this Serivce 'mytask-driver-svc.dataaccess.svc' in my k8s cluster.This is an occasional problem

ad1happy2go commented 9 months ago

@xiazhanjia Looks like the it is trying to request to timeline server which may be no longer available. Are you using spot instances?

As a workaround, you can disable timeline server for now and let us know in case you still see the issue.

codope commented 8 months ago

@xiazhanjia Were you able to resolve this issue?