elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
127 stars 136 forks source link

Elastic Agent should accept CA fingerprint containing colons #5032

Open AndersonQ opened 3 months ago

AndersonQ commented 3 months ago

For confirmed bugs, please report:

Version: 8.14.1, main

Operating System: all

Discuss Forum URL: N/A

Steps to Reproduce:

elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

belimawr commented 3 months ago

@AndersonQ which string is being used in this case? If I ran the command you mentioned I get:

openssl x509 -noout -fingerprint -sha256 -in ca-cert.pem 
sha256 Fingerprint=CC:69:1A:47:A0:43:78:3A:1A:E0:E4:22:4D:BF:54:D3:45:84:99:5D:C7:6D:B9:96:90:03:6E:70:16:37:18:65

Which contains way more then the actual fingerprint.

Looking at the error it seems the Elastic-Agent passed the value as is and then Fleet Server tried to decode it when connecting to Elasticsearch and then failed because the encoding was not the expected one.

The quickest mitigation seems to be better documenting how to obtain the fingerprint and add examples of valid inputs. In the Beats SSL configuration we have a shell command that turns the output from openssl into the format we expect. You can look at it here.

AndersonQ commented 3 months ago

I updated the issue with a more complete example. But it's exactly that, we do not inform on the Elastic Agent docs, at least we do not do on the enroll help command, the sha256 cannot contain the : separator. Also even thought the issue actually happens on the Fleet Server side, the users interact with the Elastic Agent and not Fleet Server, therefore as we discussed in the team meeting, I believe we should do our best to make things work as long as the user provide a valid input. Also, most likely the agent itself has the same problem. If it isn't starting fleet-server, the agent itself will likely present the same behaviour. Therefore we can fix it all on a single place, the Elastic Agent