elixir-mongo / mongodb

MongoDB driver for Elixir
Apache License 2.0
568 stars 157 forks source link

Unable to connect to Atlas due to DNS connectivity issues #358

Open JD-Robertson opened 3 years ago

JD-Robertson commented 3 years ago

I've run into an issue where connecting to Atlas using a mongodb+srv:// URL is failing on a specific set of AWS Windows 10 instances that my company uses for testing. We have a mix of C# and Elixir client code and only the Elixir code was failing.

I did some debugging to figure out what was going on, and found code in url_parser.ex that hardcodes 4.2.2.1 as the name server to use to lookup the records needed to decode the +srv URL. This AWS instance is pretty locked down and it is not allowed to access that server. This can be tested on the command line using nslookup.

A simpler reproducing case is to configure the Windows firewall to block connections to that specific address. Not a likely use case, but gives the same result.

I did some additional digging to figure out why the code was manually specifying a DNS server in the first place and found this issue, where apparently Erlang doesn't know how to properly retrieve the DNS configuration from the OS on Windows.

You can workaround this issue by manually setting \HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\NameServer to an approved name server so that Erlang will actually see the configuration (not desirable). I've also verified that changing the hardcoded value in url_parser.ex resolved the issue, but that's also not a real fix.

I tried using :inet_db.add_ns() to add additional configuration in our code prior to calling into the MongoDB driver, but that doesn't work. It looks like the inet_db configuration is scoped to an individual Erlang process. Changing it in one process doesn't affect others.

I'd appreciate thoughts on how to address this. It's an internal issue for us at the moment, and not a showstopper, but business ownership is concerned that this end users will start hitting this eventually.

wizardone commented 2 years ago

Just bumping this up: We are also experiencing problems connecting to Atlas. The actual error is:

[error] Mongo.Protocol (#PID<0.5272.0>) failed to connect: ** (Mongo.Error) tcp connect: non-existing domain - :nxdomain

Has there been an official solution for this problem?

scottmessinger commented 1 year ago

@wizardone We've been using Atlas for years now and haven't seen this, but we're also not using mongodb+srv as our connection string. When did it start?

@JD-Robertson Sorry no one got back to this 2 years ago when you posted it! Is this still happening?

JD-Robertson commented 1 year ago

We decommissioned the particular AWS deployment that had this issue and I haven't tried again in a while. I expect that any system that cannot access the hardcoded 4.2.2.1 name server address would have this issue. It's Windows-specific, though, and a very niche corner-case so I don't know how much effort is warranted here.