facebook / dns

Collection of Meta's DNS Libraries
Apache License 2.0
264 stars 22 forks source link

Observed inconsistencies with geo-specific responses with resolver IP vs ECS #77

Closed rjb-sophos closed 5 days ago

rjb-sophos commented 6 days ago

We run a network of DNS resolvers, running on AWS infrastructure. We have observed issues with all Meta domains such as facebook.com, instagram.com, whatsapp.com etc. where the IP address returned in response to a query with no ECS data appears to be in the incorrect region. When querying the with the IP address of the resolver in an ECS field, we get a geo-appropriate response.

Investigations so far have shown that the issue seems very much tied to the IP address of the resolver - switching IP addresses and repeating the queries can result in different responses. It has also shown that the responses for a given IP are consistent over time, although they do change occasionally. For example, an IP address located in Mumbai that on Friday was consistently receiving Singapore-based IP addresses for star-mini.c10r.facebook.com was, when tested again on Monday, receiving a Seattle-based IP address in response to the same query. Obviously, neither is ideal. At any given time, the same response is received from all four NS instances for facebook.com (a.ns.facebook.com through d.ns.facebook.com).

It's not clear if this is an issue with the code or with data. I notice from looking through the code that when an ECS field is present in a request, an attempt is first made to resolve for that Subnet, and that the resolver IP is only used if that is not present or fails. I also noticed that the lookup for the Resolver IP may select a different location map than an ECS query for the same name. Apologies if this is not the correct forum to raise this issue, but if you can point me at the right place I'd be very grateful.

The impact of this issue right now is that customers using our resolvers are seeing traffic to Meta services directed from India to Singapore or even Seattle, which in some cases impacts peering arrangements and thus incurs significant costs for them, in addition to the performance issues.

deathowl commented 6 days ago

Hi, I've received your email and i forwarded the question to the broader internal DNS team . Recommendation by our targeting-focused members is to enable ECS on your resolvers.

rjb-sophos commented 6 days ago

Thanks for the response and for forwarding on the enquiry. We are investigating implementation of ECS although this isn’t seen universally as a Good Thing.

I’m happy to engage as necessary to provide more detail on our observations.

From: Csergő Bálint @.> Date: Wednesday, September 11, 2024 at 10:09 To: facebook/dns @.> Cc: Richard Baldry @.>, Author @.> Subject: Re: [facebook/dns] Observed inconsistencies with geo-specific responses with resolver IP vs ECS (Issue #77)

Hi, I've received your email and i forwarded the question to the broader internal DNS team . Recommendation by our targeting-focused members is to enable ECS on your resolvers.

— Reply to this email directly, view it on GitHubhttps://eu-west-1.protection.sophos.com/?d=github.com&u=aHR0cHM6Ly9naXRodWIuY29tL2ZhY2Vib29rL2Rucy9pc3N1ZXMvNzcjaXNzdWVjb21tZW50LTIzNDQyMzk3MTE=&i=NTM5NWQ3OWEwOTJlYzQ0NWU3MDAxMDE4&t=NzE1akMrNVZDT3RDQ0lUNHdiWUV3b3JBL05jUWZ2WFJnOUMxMjVVTENMdz0=&h=0b88aaf67e6c45e9968ea9b573d8572d&s=AVNPUEhUT0NFTkNSWVBUSVa7Z56Wl0HUa2InR2X4iZf_VhFb887oT0fj-Wj-_R9rEA, or unsubscribehttps://eu-west-1.protection.sophos.com/?d=github.com&u=aHR0cHM6Ly9naXRodWIuY29tL25vdGlmaWNhdGlvbnMvdW5zdWJzY3JpYmUtYXV0aC9BM0xSQkVTQzVYNEtBRk5ISTdEQUg3VFpXQjJNREFWQ05GU002QUFBQUFCT0JOQU41T1ZISTJEU01WUVdJWDNMTVY0M09TTFRPTjJXS1EzUE5WV1dLM1RVSE1aREdOQlVHSVpUU05aUkdF&i=NTM5NWQ3OWEwOTJlYzQ0NWU3MDAxMDE4&t=cFRCeGJvaUxmbjVDNTcxUHMwNXFaMVVGd0gxN3RaWmJCSW0rZkh3WHQvYz0=&h=0b88aaf67e6c45e9968ea9b573d8572d&s=AVNPUEhUT0NFTkNSWVBUSVa7Z56Wl0HUa2InR2X4iZf_VhFb887oT0fj-Wj-_R9rEA. You are receiving this because you authored the thread.Message ID: @.***>

deathowl commented 5 days ago

I'll close this issue as this repo is for our open source dns related code and not prod infra issues. Please keep me posted in email about why do You think enabling ECS is not universally a good thing