Open markofsuccess opened 7 months ago
I have gone through the troubleshooting already https://github.com/cisagov/LME/blob/main/docs/markdown/reference/troubleshooting.md
Thank you for going through the troubleshooting guide and submitting this ticket. We'll review and get back to you shortly.
another thing I noticed is that when I go to event viewer > Subscriptions > lme > rightclick Runtime Status, there are 2 computer names shown , one is called: win10-1.ncp-production.local the other: win10-2.ncp-production.local and they both have the status Inactive. Could this be the cause perhaps? Those 2 computers are 2 virtual machines and they are both turned on. And also in event viewer > application and services logs > Microsoft > Windows > EventCollector > Operational there are 0 number of events, should it be like that?
status of the following commands
sudo docker ps
docker-compose logs elasticsearch
docker-compose logs kibana
You can also do:
docker-compose logs -f elasticsearch kibana
and then attempt to login and view logs real time
@aarz-snl thank you for your answer I did sudo docker ps and they seem to be up and running, status says healthy, will attach a screenshot. I tried the docker-compose commands in the /opt/lme/Chapter 3 Files directory and it says cant find a suitable configuration file, will attach screen shot of this to. What directory shall you ran the docker-compose commands?
Have you tried running "sudo su" before running all of the docker commands? It looks like you're in the right directory, but you might have to have admin privileges for everything.
sorry run docker logs containername
so docker logs lme_elasticsearch docker logs lme_kibana
youll need the full name so copy it from under the 'names' column when you do docker ps
also you said here you're using ELK 5?
Software Versions: ELK: 5.15.0-101-generic
if you do a cat docker-compose-stack-live.yml can you confirm the versions are actually 8.11.1?
also for further troubleshooting -- from your windows machine do the following
ssh -L 443:localhost:443 linuxmachineusername@linuxmachineipaddress
After you have successfully logged into the linux machine - in your browser on your windows machine type https://localhost -- then login
also you said here you're using ELK 5?
Software Versions: ELK: 5.15.0-101-generic
if you do a cat docker-compose-stack-live.yml can you confirm the versions are actually 8.11.1?
From your "docker ps" output, it does look like you're running 8.11.1 of the ELK stack. Not sure where you got 5.15.0 from.
@llwaterhouse @aarz-snl yes the ELK version is 8.11.1 confirmed by doing sudo cat docker-compose-stack-live.yml command
ignore the 5.15.0 I put in, it is something else.
sorry run docker logs containername
so docker logs lme_elasticsearch docker logs lme_kibana
youll need the full name so copy it from under the 'names' column when you do docker ps
I tried those commands and got, Error response from daemon: no such container,
i did sudo docker logs for lme_elasticsearch.1 and lmb_kibana.1 and lme_logstash.1
see screenshot of the commands and also is this how it supposed to look when doing the sudo docker stack ps lme command? I have 3 running containers and lot of containers that are failed.
also for further troubleshooting -- from your windows machine do the following
ssh -L 443:localhost:443 linuxmachineusername@linuxmachineipaddress
After you have successfully logged into the linux machine - in your browser on your windows machine type https://localhost -- then login
I did this and managed to login with my credentials but looking at the dashboard there are no results, I changed in the calender of the dashboard to last 7 days > now so there should show logon results, seems no data is being collected somehow
You need the entire container name.
from when you do 'docker ps' you need to copy and paste the entire container name
it will look like lme_elasticsearch then lots of random characters.
you want to look at docker ps not docker stack ps
it looks like the app is up and running so this is some kind of issue with the domain name -- I would like to see those logs when you login to verify
Ok sudo docker ps got the names of the containers, I did the logs on them and will attach the results in txt files as is makes is easier to read than screenshoots as it was a lot of logs. But in the elasticsearch logs, it only gave this error what I could see as it was the longest log, I could only see this error about as SSL HandshakeException bad ceritficate, like this: [2024-04-04T11:18:32,615][WARN ][io.netty.channel.DefaultChannelPipeline][main][aef10fe8420dd8309a37d0c2c0459cc237092457a3456d777371eebd64819968] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate sudo docker logs kibana.txt sudo docker logs elastic search.txt
so basically what we want to see is what logs are generated when we attempt to login
So you can do sudo docker logs -f {containername} and monitor live -- while that is running attempt to login to elastic.
When you browse to elastic do you get any certificate errors in your browser? (like does it tell you its an unsafe website and you have to click continue?) and i mean that when you go to https://linux-server not when you tunnel in and goto https://localhost
We essentiallly need to know from logs precisely whats happening after you input elastic:password and click login. Try to live watch these logs on a separate window so you can see the logs that came in at the moment of log on. After you initiate the login and logs roll by -- press ctrl c in the linux terminal to stop logs so you can capture
Yes so when I go to my https://linux-server/ it says in the browser: Your connection is not private Attackers might be trying to steal your information from 192.168.1.224 (for example, passwords, messages, or credit cards). Learn more NET::ERR_CERT_AUTHORITY_INVALID I did the sudo docker logs -f {containername} on all 3 containers but the only that logged something after logging in with my elastic credentials was the kibana container and it only gave this [2024-04-05T09:17:43.200+00:00][INFO ][plugins.security.routes] Logging in with provider "basic" (basic)
@aarz-snl any suggestions? Thank you for your time
@markofsuccess, please follow the steps in the section "Trusting the certs that secure LME's services" near the end of chapter 3.
Please let us know if this fixes your issue.
@markofsuccess Hey, I think I had the same problem, I was not getting any logs aswell.
I solved this problem by removing my old certificates and regenerating new self-signed certificates, maybe you can try that.
Now i have another problem, @llwaterhouse I have a question, I am only getting logs from my Windows Event Collector. When i am logging in with my clients it should show logon attemps from the clients right? It is not showing any logs, just only from the Event Collector.
Do i have to trust the certificates on my clients aswell?
wesliix
@wesliix @llwaterhouse thanks for the answer, I tried the steps in chapter 3, "Trusting the certs that secure LME's services" but no luck, seems its still complaining about bad ceritificate. I even removed the certificate as @wesliix suggested, I used this commannd Get-ChildItem Cert:\LocalMachine\Root\53902218B88D103F7D84A4E2F647AE6CD6592632 | Remove-Item , to remove the certificate and then I imported the certificare again. Can it be something with the firewall that is blocking something perhaps?
@markofsuccess
Do you have any kind of network security like a firewall / proxy etc that does ssl termination / inspection of your network traffic? If that doesn't have this cert that may cause problems...
If not this points to an issue with certs. Have you uninstalled / reinstalled at all? done anything else other than deploy.sh install? This could cause cert issues if you missed a step on a reinstall
After you run:
Import-Certificate -FilePath 'C:\Program Files\lme\root-ca.crt' ` -CertStoreLocation "Cert:\LocalMachine\Root"
Does it successfully import... check using:
Open Microsoft Management Console (mmc):
Press Win + R, type mmc, and press Enter.
In the MMC, go to File > Add/Remove Snap-in...
Select Certificates, click Add >, choose Computer account, and navigate through the wizard.
Under Certificates (Local Computer), navigate to Trusted Root Certification Authorities > Certificates and verify if your imported certificate is listed.
When you ran deploy.sh install and it asked for the linux server domain name is that the same one you're using to navigate to it?
I'd almost recommend just doing a deploy.sh uninstall -- say yes to deleting your certs, delete your volumes, and then do another deploy.sh install and reimport the new generated certs back to your windows server replacing the old ones.
if theres no type of network inspection or osmething happening here its just pointing to the certs not being correct
Also run these:
openssl x509 -in certs/root-ca.crt -text -noout | grep -E 'Issuer:|Subject:'
openssl x509 -in certs/elasticsearch.crt -text -noout | grep -E 'Subject:|DNS:|IP Address:'
included some outputs of what it looks like... domain and ip should match yours
root@ubuntu:/opt/lme/Chapter 3 Files# openssl x509 -in certs/root-ca.crt -text -noout | grep -E 'Issuer:|Subject:' Issuer: C = US, ST = DC, L = Washington, O = CISA, CN = Swarm Subject: C = US, ST = DC, L = Washington, O = CISA, CN = Swarm root@ubuntu:/opt/lme/Chapter 3 Files# openssl x509 -in certs/logstash.crt -text -noout | grep -E 'Subject:|DNS:|IP Address:' Subject: C = US, ST = DC, L = Washington, O = CISA, CN = ls1.lme.local DNS:ls1.lme.local, IP Address:10.0.2.15
@markofsuccess Hey, I think I had the same problem, I was not getting any logs aswell.
I solved this problem by removing my old certificates and regenerating new self-signed certificates, maybe you can try that.
Now i have another problem, @llwaterhouse I have a question, I am only getting logs from my Windows Event Collector. When i am logging in with my clients it should show logon attemps from the clients right? It is not showing any logs, just only from the Event Collector.
Do i have to trust the certificates on my clients aswell?
wesliix
@wesliix no ... logs from clients are sent via built in Windows Event Forwarding and encrypted with kerberos. If your subscription is active as per the instructions showing your clients connected it should be forwarding logs. You should see "Forwarded logs" in your event viewer on the WEC server -- if this isn't getting anything then your GPO's, or subscriptions are not setup correctly or traffic is being blocked
If there ARE logs in there then your winlogbeat isn't properly setup / isn't shipping them to the linux server
also run:
openssl verify -CAfile certs/root-ca.crt certs/elasticsearch.crt openssl verify -CAfile certs/root-ca.crt certs/kibana.crt
Both should come back OK
and also ensure the certs on your windows machine that youre importing into trusted ca store match whats actually on the linux machine being used
from the windows machine you can also use:
CertUtil -verify /path/to/root-ca.crt/
To check if its trusted in the store
Invoke-WebRequest -Uri https://(elasticsearch_url) -Method Get
curl -v https://(elasticserverurl) (this gives you verbose output on what happens when you navigate to the elastic server)
also select in your browser where it says "not secure' in the url bar and click on 'certificate is not valid' this should contain cert information from your generated cert like so:
Common Name (CN) kibana
Organization (O) CISA
Organizational Unit (OU) <Not Part Of Certificate>
Common Name (CN) Swarm
Organization (O) CISA
Organizational Unit (OU) <Not Part Of Certificate>
Issued On Thursday, April 11, 2024 at 1:47:07 PM
Expires On Friday, May 1, 2026 at 1:47:07 PM
that said -- you should still be able to continue to the website (which i assume you have done) to actually login. I do that on my demo server all the time when i just skip the trusted store part and login just fine... so its still important to live watch your logs to determine why the 404 happens
but the fact that you're importing the cert to the cert store and its still giving you an error leads me to believe something is wrong with the certs not matching up -- or you have some kind of security tool on the network that taking over tls termination -- because tunneling into the server appears to function fine
@aarz-snl thank you for your detailed answers. I checked in mmc and I found the certificate named swarm, I even found 2 of it, so I deleted the old and kept the newly created. I have done uninstall and reinstall mutiple times. Not sure if I have some firewall/proxy setting that does ssl termination...do you know how to check that?
Answer to this: When you ran deploy.sh install and it asked for the linux server domain name is that the same one you're using to navigate to it?
Certs seem to be OK
also the certificate information from the broswer does show the details
Commands on Windows Machine
curl -v https://(elasticserverurl) this command gave:
This command gave me file_not_found even though when i use the Windows search bar and paste in the path C:\Program Files\lme\root-ca.crt it opens the file.(I added it to the screenshoot, you can install the certificate, is that needed? CertUtil -verify C:\Program Files\lme\root-ca.crt
Invoke-WebRequest -Uri https://(elasticsearch_url) -Method Get
Thank you for your time for helping me with this as It is the first time for me setting up something like this. This is one a domain controller and it is perhaps something with the name resolution or IP adress that seem to be a issue. Or I just start over from beginning and redo all the steps.
@aarz-snl I did uinstall and reinstall and tried logging in again but its again 404, when I tried these commands for checking SSL i get this back, see screenshot: 40774A7B337F0000:error:8000000D:system library:BIO_new_file:Permission denied:../crypto/bio/bss_file.c:67:calling fopen(certs/root-ca.crt, r)
sudo docker logs for logstash dont complain about the ssl handshake anymore, it looks like this
you forgot 'sudo' before running the ssl checks as they're in a protected folder now.
For this command try putting the location in quotes:
CertUtil -verify "C:\Program Files\lme\root-ca.crt"
for the other command - the (elasticurl) was just a placeholder... that needs to the url you go to when logging on to elastic - the command also has be ran in powershell.. id right click and run as adminstrator if you have the ability -- and the I in Invoke should be capital -- press tab as you type to make sure its picking it up.
the logs we really want to see from docker logs is elasticsearch and kibana. We really need to see those live as login happens. You can add a -f to docker logs and it will live scroll. Then press ctrl c to exit once you have captured what you need. Do this for both.So you will have to login twice.
If -f isn't working out for you you can also do docker logs
Just to clarify this is your current path:
You type in https://domainname (what is the domain name you're typing in?) from your windows server. The elastic splash screen shows up. You type in your username and password and hit login. Then you get a 404? If thats the case we have to see the logs when you hit login to see whats happening.
Your certs appear to be fine -- the dns error is weird, and i suspect it may have something to do with resolution going on. We wont know for sure until we see whats happening when you attempt to login
Are you browsing to the logging made easy from the SERVER or from the windows client? Where are you installing the certificates to? is the the same as where you're installing the certificates? I noticed in your description you listed:
Desktop: OS: Windows 10
But what about your server?
@aarz-snl output of: CertUtil -verify "C:\Program Files\lme\root-ca.crt" Issuer: CN=Swarm O=CISA L=Washington S=DC C=US Name Hash(sha1): 5e4ca8e6bebcf0c6bf358abbb33cc510d04393f3 Name Hash(md5): b7d4e381415897c34bf07bf0054bbbf2 Subject: CN=Swarm O=CISA L=Washington S=DC C=US Name Hash(sha1): 5e4ca8e6bebcf0c6bf358abbb33cc510d04393f3 Name Hash(md5): b7d4e381415897c34bf07bf0054bbbf2 Cert Serial Number: 3f575738b13eb84936b38d40d4733f0a63a700e5
dwFlags = CA_VERIFY_FLAGS_CONSOLE_TRACE (0x20000000) dwFlags = CA_VERIFY_FLAGS_DUMP_CHAIN (0x40000000) ChainFlags = CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT (0x40000000) HCCE_LOCAL_MACHINE CERT_CHAIN_POLICY_BASE -------- CERT_CHAIN_CONTEXT -------- ChainContext.dwInfoStatus = CERT_TRUST_HAS_PREFERRED_ISSUER (0x100)
SimpleChain.dwInfoStatus = CERT_TRUST_HAS_PREFERRED_ISSUER (0x100)
CertContext[0][0]: dwInfoStatus=10c dwErrorStatus=0 Issuer: CN=Swarm, O=CISA, L=Washington, S=DC, C=US NotBefore: 2024-03-15 16:46 NotAfter: 2034-03-13 16:46 Subject: CN=Swarm, O=CISA, L=Washington, S=DC, C=US Serial: 3f575738b13eb84936b38d40d4733f0a63a700e5 Cert: 53902218b88d103f7d84a4e2f647ae6cd6592632 Element.dwInfoStatus = CERT_TRUST_HAS_NAME_MATCH_ISSUER (0x4) Element.dwInfoStatus = CERT_TRUST_IS_SELF_SIGNED (0x8) Element.dwInfoStatus = CERT_TRUST_HAS_PREFERRED_ISSUER (0x100)
Verified Issuance Policies: All Verified Application Policies: All Cert is a CA certificate Cannot check leaf certificate revocation status CertUtil: -verify command completed successfully.
@aarz-snl curl -v https://192.168.1.224/ubuntuelk.ncp-production.local VERBOSE: GET https://192.168.1.224/ubuntuelk.ncp-production.local with 0-byte payload curl : The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. At line:1 char:1
+ CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebException
+ FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
Try using the domain name rather than the ip. The cert may be applied to the domain you used rather than the ip.
You type in https://domainname (what is the domain name you're typing in?) from your windows server. The elastic splash screen shows up. You type in your username and password and hit login. Then you get a 404? If thats the case we have to see the logs when you hit login to see whats happening.
I type in https://192.168.1.224/ubuntuelk.ncp-production.local which then shows Your connection is not private Attackers might be trying to steal your information from 192.168.1.224 (for example, passwords, messages, or credit cards). Learn more NET::ERR_CERT_AUTHORITY_INVALID
and there is a button called (Advanced) and I press it and then it says This server could not prove that it is 192.168.1.224; its security certificate is not trusted by your computer's operating system. This may be caused by a misconfiguration or an attacker intercepting your connection.
Proceed to 192.168.1.224 (unsafe)
When I press on proceed it brings me to the elastic splash screen where I put in my login credentials and press enter, and then I get a 404
ubuntuelk is the name of the linux machine, and ncp-production.local is name of the domain which both the windows client and linux machine are one. On the domain I have 3 Prod Servers, and I'm logged in at Prod-Server 1 and my linux machine is a hyper V machine which I start on the same server as the windows client.
Server is Windows Server 2022 Version 21H2(OS Build 20348.240)
you forgot 'sudo' before running the ssl checks as they're in a protected folder now. See output from putting sudo before and the Invoke-WebRequest: command not found. Invoke-WebRequest -Uri https://192.168.1.224/ubuntuelk.ncp-production.local -Method Get markus@ubuntuelk:/opt/lme/Chapter 3 Files$ sudo openssl verify -CAfile certs/root-ca.crt certs/elasticsearch.crt sudo: unable to resolve host ubuntuelk.ncp-production.local: Temporary failure in name resolution certs/elasticsearch.crt: OK markus@ubuntuelk:/opt/lme/Chapter 3 Files$ sudo openssl verify -CAfile certs/root-ca.crt certs/kibana.crt sudo: unable to resolve host ubuntuelk.ncp-production.local: Temporary failure in name resolution certs/kibana.crt: OK markus@ubuntuelk:/opt/lme/Chapter 3 Files$ sudo openssl verify -CAfile certs/root-ca.crt certs/kibana.crt sudo: unable to resolve host ubuntuelk.ncp-production.local: Temporary failure in name resolution certs/kibana.crt: OK markus@ubuntuelk:/opt/lme/Chapter 3 Files$ sudo openssl verify -CAfile certs/root-ca.crt certs/elasticsearch.crt sudo: unable to resolve host ubuntuelk.ncp-production.local: Temporary failure in name resolution certs/elasticsearch.crt: OK markus@ubuntuelk:/opt/lme/Chapter 3 Files$ Invoke-WebRequest -Uri https://192.168.1.224/ubuntuelk.ncp-production.local -Method Get Invoke-WebRequest: command not found
As clint alluded to -- typically you dont include an IP address AND a domain.
You would do one or the other... https://192.168.1.224
or
https://ubuntuelk.ncp-production.local
this is why none of your commands are properly resolving
When i do this in my lab i name my linux machine ls1.lme.local
so when I navigate to it from my windows server i do https://ls1 or https://ls1.lme.local
you're basically saying "go to https://mylinuxserver/mylinuxserver" which isn't a proper address. ALso, because you start with the IP address it may actually be bringing you to the login screen -- but I'm not sure logging on with just the IP address will be successful from a remote machine. I believe you have to use the domain name. The certificate uses your domain name (ubuntuelk.ncp-production) not your ip address. So you would get certificate errors if you just use the IP address
So just try: https://ubuntuelk.ncp-production.local or https://ubuntuelk.ncp-production -- if everything has been set up properly in your DC and your linux is properly apart of your domain tree it should resolve and bring you to elastic login
@aarz-snl hey thanks I managed finally to log in now and not getting an 404 error :D with going https://192.168.1.224 and not the domain name, however https dont work only http but that is perhaps not an issue with accessing the tool. Now I will continue with step 4 in the installation process, I have gone in and looked at the dashboard and selected the date intervall to show the last 30 days but noting is shown, is that normal and perhaps I need to complete the steps in the Chapter 4 installation for logs to show? Thank you so much for the support, much appreciated
You really want to be logging in with the domain name and not the IP address. If domain name isn't working there are deeper rooted issues with the domain name and your DC / Domain in general. That means nothing else will probably work either as its all mostly domain name based with GPO's, subscriptions to the WEC, etc.
ncp-production.local has to be able to work for all devices involved
Is this completed? If so, let's close it as Done.
BEFORE CREATING THE ISSUE, CHECK THE FOLLOWING GUIDES:
If the above did not answer your question, proceed with creating an issue below:
Describe the issue
I try to log in to my kibana interface https://<LINUX_SERVER_IP/HOSTNAME> in my web browser I put in elastic as user followed by the generated password in earlier steps of the installation. When I put in my credentials and press enter I get forwarded to a 404 page and it says: {"statusCode":404,"error":"Not Found","message":"Not Found"} I have tried in incoqnito mode as well as trying different browsers. Another thing to add, not sure if it of any relevance but I cant access the kibana interface in the browser with https:// only http://
To Reproduce
Open Brave broswer or microsoft edge as private new window and go to my kibana log in http://<LINUX_SERVER_IP/HOSTNAME> put in elastic as user followed by the password. When submitted, I get forwarded to {"statusCode":404,"error":"Not Found","message":"Not Found"} page
Please complete the following information
Desktop:
Server:
OPTIONAL:
image result of these commands free -h df -h uname -a lsb_release -a
for name in $(sudo docker ps -a --format '{{.Names}}'); do echo -e "\n\n\n-----------$name----------"; sudo docker logs $name | tail -n 20; done