Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

Avoid throwing RuntimeException when master hostname is not resolved on metrics reporting #12672

Closed cheyang closed 3 years ago

cheyang commented 3 years ago

Alluxio Version: What version of Alluxio are you using? 2.4.1-1

Describe the bug A clear and concise description of what the bug is.

If I didn't set hostname and ip address mapping in /etc/hosts

2020-12-22 12:45:52,344 ERROR AbstractShell - Error running report
java.lang.RuntimeException: java.net.UnknownHostException: abc: abc: Name does not resolve
        at alluxio.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:501)
        at alluxio.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:423)
        at alluxio.util.network.NetworkAddressUtils.getLocalHostMetricName(NetworkAddressUtils.java:442)
        at alluxio.metrics.MetricsSystem.constructSourceName(MetricsSystem.java:201)
        at alluxio.metrics.MetricsSystem.lambda$static$0(MetricsSystem.java:88)
        at alluxio.util.CommonUtils$2.firstTime(CommonUtils.java:785)
        at alluxio.util.CommonUtils$2.get(CommonUtils.java:780)
        at alluxio.metrics.MetricsSystem.getMetricNameWithUniqueId(MetricsSystem.java:387)
        at alluxio.metrics.MetricsSystem.initShouldReportMetrics(MetricsSystem.java:749)
        at alluxio.metrics.MetricsSystem.reportMetrics(MetricsSystem.java:559)
        at alluxio.metrics.MetricsSystem.reportClientMetrics(MetricsSystem.java:635)
        at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:88)
        at alluxio.client.metrics.MetricsHeartbeatContext.heartbeat(MetricsHeartbeatContext.java:95)
        at alluxio.client.metrics.MetricsHeartbeatContext.close(MetricsHeartbeatContext.java:125)
        at alluxio.client.metrics.MetricsHeartbeatContext.removeContext(MetricsHeartbeatContext.java:108)
        at alluxio.client.metrics.MetricsHeartbeatContext.removeHeartbeat(MetricsHeartbeatContext.java:198)
        at alluxio.client.file.FileSystemContext.closeContext(FileSystemContext.java:300)
        at alluxio.client.file.FileSystemContext.close(FileSystemContext.java:267)
        at alluxio.cli.fsadmin.FileSystemAdminShellUtils.checkMasterClientService(FileSystemAdminShellUtils.java:67)
        at alluxio.cli.fsadmin.command.ReportCommand.run(ReportCommand.java:107)
        at alluxio.cli.AbstractShell.run(AbstractShell.java:137)
        at alluxio.cli.fsadmin.FileSystemAdminShell.main(FileSystemAdminShell.java:71)
Caused by: java.net.UnknownHostException: abc: abc: Name does not resolve
        at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
        at alluxio.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:459)
        ... 21 more
Caused by: java.net.UnknownHostException: abc: Name does not resolve
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
        at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
        ... 22 more

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

gpang commented 3 years ago

@cheyang Thanks for the report. Is there something that has a name of abc? What a minimal set of steps to reproduce this?

apc999 commented 3 years ago

@gpang it is hostname resolution issue --- that part has been resolved. the concern here is that metrics system should not fail the entire process

apc999 commented 3 years ago

@JySongWithZhangCe can you take this issue? basically the error (not being able to resolve master hostname) for metrics reporting should not bring down the process

JySongWithZhangCe commented 3 years ago

@JySongWithZhangCe can you take this issue? basically the error (not being able to resolve master hostname) for metrics reporting should not bring down the process

Sure!

gpang commented 3 years ago

I see, then maybe the issue should be renamed to be reflect the true problem.

JySongWithZhangCe commented 3 years ago

Is this sense only happening in client side?