LDAP with 2 nodes - Githubissues

chrland commented 8 years ago

Hi,

I have a setup with 5 servers using the ova 1.3.1: one: webinterface (reconfigure-as-webinterface) two: graylog-server and mongodb (reconfigure-as-server) two elasticsearch (reconfigure-as-datanode)

When I setup LDAP against AD it work fine, when I only have one graylog-server node in the setup. When I add the second graylog-server node to set setup, everything works fine, except LDAP. If I go to http://"server ip"/system/ldap, the LDAP configuration comes and goes, when the refresh the page. Is is like the LDAP config is not replicated correct to the second node. If I disconnect or power down the second node, LDAP works fine again. Any ideas how to fix this error?

mariussturm commented 8 years ago

Hi, the ldap settings should be replicated across two servers as long as they both connect to the same mongodb. Could you please verify that both servers use the same database? You can take a look into /opt/graylog/conf/graylog.conf and search for mongodb_uri. The master server should use 127.0.0.1 the second server should use the external IP of the master server.

chrland commented 8 years ago

I can verify that the config is as you describe. Master node point at 127.0.0.1 and second node point at master IP.

123dev commented 8 years ago

We have exactly the same problem as chrland and us too we have the mongodb properly set in the config files. (master is 127.0.0.1 whereas the slave is by master's ip) LDAP behavior is very sporadic, users have to try to login several times until one works. If i go to LDAP Group mappings, most often in comes back with a message that LDAP is disabled, however if I click on LDAP settings from that same page, I do see that it is enabled. If I go back to LDAP Group Mappings, it might work, but that is totally unpredictable when it will and when it won't.

Now that I saw this ticket, I tried stopping server 2, and lo and behold, LDAP is working smoothly.

It is worth noting that whenever master is down for maintenance or otherwise, Web interface never uses the Slave, it complains that it can't find a graylog server. Not sure if at all related, just noting in case it rings a bell.

kroepke commented 8 years ago

This sounds like a connectivity problem from the web interface to the second server. The web interface sounds requests round-robin to all known graylog servers, there's no such thing as a "fallback". The master merely runs certain housekeeping tasks (rotation, retention etc) that other nodes do not.

The LDAP configuration will be exactly the same on both servers, because it comes from the same mongodb collection. If a graylog server cannot reach the mongodb it will not be in the cluster, either, it currently is the only point where the servers communicate.

123dev commented 8 years ago

Thanks Kay for the follow up.

What are the chances that two unrelated users, with similar setups have exactly misconfigured connectivity to server 2 meanwhile properly configuring connectivity to server 1?.

To eliminate any potential connectivity possibility, I stopped server 1 sudo graylog-ctl stop graylog-server configured server2 to be a master and Web Interface was seeing the server2, and I was able to browse around all the menus without any issues with the exception of LDAP settings (which came up blank) I think this test should eliminate connectivity from being the culprit.

Furthermore the web interface will not display anything if server2 is not configured as master. It will merely complains that it could not find a master. To get around this and have a fallback mechanism, I have to set both servers as master and accept the annoying error that shows up if multiple servers are configured as masters.

Multiple graylog2-server masters in the cluster (triggered a minute ago)
There were multiple graylog2-server instances configured as master in your Graylog cluster. The cluster handles this automatically by launching new nodes as slaves if there already is a master but you should still fix this. Check the graylog2.conf of every node and make sure that only one instance has is_master set to true. Close this notification if you think you resolved the problem. It will pop back up if you start a second master node again.

Is this behavior intentional? It kind of defeats the purpose of having multiple servers if there is no failover. (I know there is failover for the inputs, but why stop there and not have for the web interface as well?)

Note, I compared the two server config files, and notices that they had different password_secrets I changed server2 to match server1's password secret. Since the change LDAP comes up ok with both servers running (though I can't be certain that this would still be the case later on, as the LDAP behavior has always be sporadic for us) Could this have been the root issue, if so why would everything else appear to have been working with the exception of the LDAP?

Thanks again for looking into the issue.

123dev commented 8 years ago

Just as feared, LDAP problem is back again. Sometimes working and sometimes not, so it had nothing to do with the password secret. :(

chrland commented 8 years ago

It is easy to recreate the issue. Just download the graylog OVA and deploy two servers.

First server, just boot it.
Second server, boot it and run:

sudo graylog-ctl set-cluster-master <ip-of-vm1>
sudo graylog-ctl reconfigure
sudo graylog-ctl restart

Login to graylog web interface using the IP of the first sever.
Setup LDAP and save the config.
Go back to LDAP setup and refresh the page a few time. You will see that your LDAP config from time to time disappears.

mariussturm commented 8 years ago

Hi, could you please try the following, basically the procedure like you said but with a shared password_secret:

sudo graylog-ctl set-cluster-master <ip-of-vm1>
sudo graylog-ctl set-server-secret <token found on vm1 in /etc/graylog/graylog-secrets.json -> graylog_server->secret_token>
sudo graylog-ctl reconfigure
sudo graylog-ctl restart

Graylog2 / graylog2-images

LDAP with 2 nodes #107