CESNET / rousette

RESTCONF server for sysrepo
https://gerrit.cesnet.cz/q/project:CzechLight/rousette
Apache License 2.0
7 stars 2 forks source link

DNS resolution for sysrepo backed by MongoDB on Docker #15

Open roc-ops opened 1 week ago

roc-ops commented 1 week ago

I am trying to setup rousette on docker-compose with sysrepo backed by a Mongodb server on a separate container but rousette does not like dns resolution on docker

when starting rousette (.venv) root@ee22530c790b:/build/rousette/yang# rousette [2024-11-20 22:39:14.591] [rousette] [debug] NACM config validation: no rule-list entries [2024-11-20 22:39:14.592] [rousette] [info] NACM config validation: Anonymous user access disabled [2024-11-20 22:39:14.594] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. terminate called after throwing an instance of 'std::runtime_error' what(): Server error: Host not found (authoritative) Aborted (core dumped)

if I change my DNS server to 8.8.8.8 (or any public) (.venv) root@ee22530c790b:/build/rousette/yang# rousette 2024/11/20 22:40:51.0414: [12045]: DEBUG: monitor: [mongo:27017] command or network error occurred: Failed to resolve mongo

mongo is the name of my mongodb service in my docker compose file it is reachable via ICMP (.venv) root@ee22530c790b:/build/rousette/yang# ping mongo PING mongo (172.28.0.2) 56(84) bytes of data. 64 bytes from sysrepo-mongo.sysrepo-mongo_mongo (172.28.0.2): icmp_seq=1 ttl=64 time=0.150 ms

however a nslookup shows that docker's DNS is giving a non-authoritive answer (.venv) root@ee22530c790b:/build/rousette/yang# nslookup mongo Server: 127.0.0.11 Address: 127.0.0.11#53

Non-authoritative answer: Name: mongo Address: 172.28.0.2

The Netopeer2 works on same machine, so I know sysrepo is working properly

it seems the DNS resolution for sysrepo mongo instance, in rousette, is only accepting authoritive DNS responses via Boost

if I add mongo to /etc/hosts, and set resolve.conf to 8.8.8.8 I can ping mongo but still same error when starting rousette

(.venv) root@ee22530c790b:/build/rousette/yang# ping mongo PING mongo (172.28.0.2) 56(84) bytes of data. 64 bytes from mongo (172.28.0.2): icmp_seq=1 ttl=64 time=0.156 ms 64 bytes from mongo (172.28.0.2): icmp_seq=2 ttl=64 time=0.114 ms ^C --- mongo ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1013ms rtt min/avg/max/mdev = 0.114/0.135/0.156/0.021 ms (.venv) root@ee22530c790b:/build/rousette/yang# nslookup mongo Server: 8.8.8.8 Address: 8.8.8.8#53

** server can't find mongo: NXDOMAIN

(.venv) root@ee22530c790b:/build/rousette/yang# rousette [2024-11-20 22:48:02.452] [rousette] [debug] NACM config validation: no rule-list entries [2024-11-20 22:48:02.452] [rousette] [info] NACM config validation: Anonymous user access disabled [2024-11-20 22:48:02.457] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. terminate called after throwing an instance of 'std::runtime_error' what(): Server error: Host not found (authoritative) Aborted (core dumped)

jktjkt commented 1 week ago

Could you please attach a full backtrace when that exception is thrown?

roc-ops commented 1 week ago

Not too familiar with generating backtraces used instructions from here: https://wiki.ubuntu.com/Backtrace If you need something different let me know. gdb-rousette.txt

jktjkt commented 1 week ago

Thanks. Unfortunately, the backtrace only says that it's "somewhere from the constructor". There's a lot going on in there, and I'm missing some more detailed info. There's nothing directly in rousette which would attempt to resolve hostnames, but there's plenty of bugreports around the web which mention Docker's network setup, problems with DNS resolution and this very same error message. Also, the SW stack that we're using, which is based on Boost-ASIO, might be relevant here.

The Netopeer2 works on same machine, so I know sysrepo is working properly

We don't know that yet. Rousette is creating a different set of internal subscriptions. It is "possible" that some of these subscriptions refer to a module whose datastore uses the MongoDB plugin, which attempts to resolve stuff, which breaks under your specific network setup.

Can you please ensure that you've built rousette with -ggdb, run it under gdb interactively (e.g., gdb /path/to/rousette), set a breakpoint (via break __cxa_throw when in GDB), then run, then backtrace (or thread apply all backtrace)?

roc-ops commented 1 week ago

What I was implying with the Netopeer comment was that sysrepo and netopeer2 seem to be both connecting to the datastores backed by mongoDb on another container (running, operational in my case) and startup in non-mongoDb (json)

the -ggdb flag did not work with cmake, a little googling and I found this: cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .. which is how I built it

gdb-rousette2.txt Here is the backtrace both full and as requested.

Since the error seems to occur in sysrepo-cpp I also enabled debug symbols on that package and ran another gdb backtrace not sure if that helps or not gdb-rousette3.txt

jktjkt commented 1 week ago

Thanks, and just confirming that the CMake flags are correct; that should get you going.

The error message that's shown now is a different one, you're getting a SR_ERR_OPERATION_FAILED when establishing the initial connection (or starting a sysrepo session; the C++ bindings use a wrong error message). Are you sure your environment has not changed? If you run it without gdb (but the exact same build, etc), what error message do you get now?

roc-ops commented 1 week ago

gdb-rousette4.txt Sorry, when I restarted docker some preliminary commands did not get run now it is the same error

jktjkt commented 1 week ago

Sorry for yet another round of instructions, there are some internal exceptions which are usually ignored. Do something like ignore 1 4, or just continue four times when the exceptions are thrown. The first four exceptions are "harmless" and are handled immediately.

roc-ops commented 1 week ago

gdb-rousette5.txt Ok here you go.

jktjkt commented 1 week ago

That was continue entered five times. You need four I'm afraid.

Also, consider picking this patch and running rousette with --sysrepo-log-level=3 for extra debug messages. If this is from a sysrepo plugin (which I suspect it is), this might help shed some light into the specific details.

roc-ops commented 1 week ago

Here is four continues before patch gdb-rousette6.txt and four continues after patch gdb-rousette7.txt

jktjkt commented 4 days ago

We looked into this with @peckato1 , and we were able to reproduce this even without MongoDB when the IPv6 network stack was completely disabled. You mentioned Docker and, unfortunately, it seems that they still disable IPv6 by default. Could you please check whether the daemon starts up properly when you enable IPv6? You don't need any particular settings, just a ::1/128 on the lo interface is sufficient.

Alternatively, you might try patching the server like this:

diff --git a/src/restconf/main.cpp b/src/restconf/main.cpp
index 0c31f08..b8c6e12 100644
--- a/src/restconf/main.cpp
+++ b/src/restconf/main.cpp
@@ -160,7 +160,7 @@ int main(int argc, char* argv [])
     }

     auto conn = sysrepo::Connection{};
-    auto server = rousette::restconf::Server{conn, "::1", "10080", timeout};
+    auto server = rousette::restconf::Server{conn, "127.0.0.1", "10080", timeout};
     signal(SIGTERM, [](int) {});
     signal(SIGINT, [](int) {});
     pause();

Please be advised that we do require IPv6; the service is designed to run behind a reverse proxy, and we might rely on IPv6-features for a proper, secure TLS setup in future.

For those reading this bugreport in future: the suggestion for hardcoding 127.0.0.1 is meant only as a temporary step during debugging to verify the root cause of this particular issue.