dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.65k forks source link

DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs failed in CI #34317

Closed jaredpar closed 4 years ago

jaredpar commented 4 years ago

Console Log Summary

    System.Net.NameResolution.Tests.GetHostEntryTest.DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs(mode: 0) [FAIL]
      Assert.All() Failure: 2 out of 4 items in the collection did not pass.
      [3]: Item: fe80::20d:3aff:fe5b:2be9%2
           Xunit.Sdk.TrueException: Not a loopback address: fe80::20d:3aff:fe5b:2be9%2
           Expected: True
           Actual:   False
              at Xunit.Assert.True(Nullable`1 condition, String userMessage) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\BooleanAsserts.cs:line 95
              at Xunit.Assert.True(Boolean condition, String userMessage) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\BooleanAsserts.cs:line 83
              at System.Net.NameResolution.Tests.GetHostEntryTest.<>c.<DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs>b__13_0(IPAddress addr) in /_/src/libraries/System.Net.NameResolution/tests/FunctionalTests/GetHostEntryTest.cs:line 205
              at Xunit.Assert.All[T](IEnumerable`1 collection, Action`1 action) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\CollectionAsserts.cs:line 36
      [2]: Item: 10.0.0.22
           Xunit.Sdk.TrueException: Not a loopback address: 10.0.0.22
           Expected: True
           Actual:   False
              at Xunit.Assert.True(Nullable`1 condition, String userMessage) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\BooleanAsserts.cs:line 95
              at Xunit.Assert.True(Boolean condition, String userMessage) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\BooleanAsserts.cs:line 83
              at System.Net.NameResolution.Tests.GetHostEntryTest.<>c.<DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs>b__13_0(IPAddress addr) in /_/src/libraries/System.Net.NameResolution/tests/FunctionalTests/GetHostEntryTest.cs:line 205
              at Xunit.Assert.All[T](IEnumerable`1 collection, Action`1 action) in C:\Dev\xunit\xunit\src\xunit.assert\Asserts\CollectionAsserts.cs:line 36
      Stack Trace:
        /_/src/libraries/System.Net.NameResolution/tests/FunctionalTests/GetHostEntryTest.cs(205,0): at System.Net.NameResolution.Tests.GetHostEntryTest.DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs(Int32 mode)
        --- End of stack trace from previous location ---

Builds

Build Pull Request Test Failure Count
#580674 Rolling 1

Configurations

Only seen one failure so far but also suspicious that this showed up in CI

jaredpar commented 4 years ago

This is now regularly blocking CI @karelz

Builds

Build Pull Request Test Failure Count Date
#584646 Rolling 1 2020/4/1
#584991 Rolling 1 2020/4/1
#585184 Rolling 2 2020/4/2

Configurations

runfo tests -d runtime -c 100 -pr -n System.Net.NameResolution.Functional.Tests -m

karelz commented 4 years ago

@MihaZupan can you please take a look at this new failure? Kusto database mining shows first and only failure 2 days ago (3/31). Disabling the test to unblock CI may be best first response. cc @alnikola

jaredpar commented 4 years ago

Builds

Build Pull Request Test Failure Count
#585813 Rolling 1
#586664 Rolling 1
#587083 Rolling 1

Configurations

jaredpar commented 4 years ago

This is causing roughly a 3% failure rate at this point. Do we have an ETA for when this will be disabled?

karelz commented 4 years ago

Sorry for that, I thought @MihaZupan had chance to do it over night. PR is up - see #34527

MihaZupan commented 4 years ago

From console logs here I am seeing the following tests failing

DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs
DnsObsoleteGetHostByName_EmptyString_ReturnsHostName
DnsObsoleteBeginEndGetHostByName_EmptyString_ReturnsHostName
Dns_GetHostEntry_HostString_Ok
Dns_GetHostEntryAsync_HostString_Ok

Which looks like the 4 mentioned in https://github.com/dotnet/runtime/issues/1488 + now DnsGetHostEntry_LocalHost_ReturnsFqdnAndLoopbackIPs

wfurt commented 4 years ago

This may be misconfigured machine. Also SLES12 uses systemd and that resolver can synthesize response. I think we should collect more system info on failure. If we have helper diag functions, that may be helpful for other DNS test failures.

alnikola commented 4 years ago

@wfurt I think you are right because the test always fails when it's run on an sles.12.amd64.open agent with machine name 'localhost' which looks weird.

alnikola commented 4 years ago

The failures were caused by a Helix infra issue which was resolved yesterday.

karelz commented 4 years ago

@alnikola did we re-enable the tests? Was it the misconfiguration with localhost?

karelz commented 4 years ago

Reopening to re-enable the tests ... https://github.com/dotnet/runtime/blob/d818d33294288fce07ec7b89b1a16922cd24e451/src/libraries/System.Net.NameResolution/tests/FunctionalTests/GetHostEntryTest.cs#L189

jaredpar commented 4 years ago

@alnikola

The failures were caused by a Helix infra issue which was resolved yesterday.

What Helix issue was this? I'm looking for an issue in core-eng or arcade to link to. If they didn't create on for this problem we should push them to do so.

Issues are the primary way we track reliability between our services. If there is a bug in Helix, Azure, etc ... that impacted our reliability we should push to make sure that there is an issue tracking that. Always feel free to include me in the convo to help with this if needed.

alnikola commented 4 years ago

The issue was closed without enabling the affected test by mistake. Will do it shortly.

jaredpar commented 4 years ago

@alnikola

Where is the Helix issue that describes the bug that they fixed? That is what I'm interested in. If they're not filing bugs then it's issues we're not tracking. That means we can't track improvements.

alnikola commented 4 years ago

@jaredpar I reported the issue with a strange agent name ('localhost') to the engineering services team and they said it's a known issue which has been already fixed. So, I don't have a link. Will ping you offline for the details.