Open wccropper opened 4 days ago
I found 2 issues so far. 1st was an ACL on the network switch whcih has been fixed to allow 8080. 2nd was i had a type in the external fqdn. I have fixed this in the /etc/httpd/conf.d/ood-portal.conf and restarted httpd. I now get the following redirect url and error:
https://titan-master1.global.internal:8080/pun/sys/dashboard
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at root@localhost to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.
I have searched all files in /etc /trinity and /opt, fixing the gloabl to global typo. How do I recreate the ssl certs? I ran the (updated typo) ansible=playbook controller.yml, but this did not update the certs.
I ended up reinstalling completely (no typo) and I have a semi-working cluster and can access the dashboard. I am not able to get a node to PXE boot now. receiving this error on the node:
I was able to fix this by adding 192.168.213.0/24 to the sources on the trusted zone. Now the node was setup to use BOOTIF 192.168.213.71 and BMC 192.168.213.71, but it change the ip to 169.254.0.2. I usually use a dedicated iDRAC not a shared and have each node configured so I can access them in case of issue. Does this not control the ip assigned?
[root@titan-master1 trinityX]# luna network list
+--------------------------------------------------------------------------------------------+
| << Network >> |
+---+---------+--------------------+------------+-------+------------------+-----------------+
| # | name | network | type | dhcp | dhcp_range_begin | dhcp_range_end |
+---+---------+--------------------+------------+-------+------------------+-----------------+
| 1 | cluster | 192.168.213.128/25 | ethernet | True | 192.168.213.129 | 192.168.213.169 |
| | | | | | | |
| 2 | ipmi | 192.168.213.0/25 | ethernet | False | --NA-- | --NA-- |
| | | | | | | |
| 3 | ib | 10.149.0.0/16 | infiniband | False | --NA-- | --NA-- |
+---+---------+--------------------+------------+-------+------------------+-----------------+
I was able to get the idrac access back using the racadm tool. It has wiped the gateway. luna/lpower are unable to communicate with the idrac. for now I have disabled it from being managed on the nodes, but left it enabled on the group. Any assistance here is appreciated.
Hi wccropper.
As you have found out (also mentioned in more places), the trix_external_fqdn must match with how the controller is resolved from the outside, where you typically connect to to open the dashboard on port 8080. Seeing the internal server error is a strong indication that the certificate(s) don't match with how the server was reached. since we use the certificate in more places, simply re-running the playbook won't solve this. we can help you telling how to recreate certificates. this involves a few steps to make sure other things like openldap won't break. however I assume you have this sorted based on your answer above?
it's not clear why your machines don't do pxe properly. a screenshot shows this but we might have to have logs (e.g. /var/log/luna/luna2-daemon.log and the group_vars/all.yml) to give us a bit more insight. Also 'lexport -c -e' is helpful as it tells us how the ip-s were assigned, networks configured etc etc.
lastly, the bmc plugin should work with ipmi compliant machines. idrac shouldn't be a problem. if the gateway is lost, then most likely there was no gateway defined for the ipmi network?
in short, i see quite a few messages and try to understand what's happening. can you share logs/data here or by other means?
with kind regards, Antoine
I have just installed a new controller node and I am unable to access the :8080 dashboard. I can access the nginx page when just accessing https. When I curl the :8080 it returns
Below are my current configs. I used the main branch and the INSTALL.sh script.