fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.02k stars 419 forks source link

HTTP 5XX on PUT /mdm/apple/mdm #11455

Closed rfairburn closed 1 year ago

rfairburn commented 1 year ago

Fleet version: v4.30.1

Operating system: MDM Client was MacOS with user agent MDM-OSX/1.0 mdmclient/1640

Web browser: N/A


🧑‍💻  Expected behavior

We shouldn't get mdm mismatches in the environment. Even if-so, a certificate mismatch should probably be an http 4XX error and not a 500.

💥  Actual behavior

image

level=info ts=2023-05-01T14:52:17.105457551Z component=http-mdm-apple-mdm handler=cert-verify msg="error verifying MDM certificate" err="missing MDM certificate"

👣 Reproduction steps

More info

roperzh commented 1 year ago

I did some preliminary research on this, and I couldn't reproduce right away. I set up my env doing a git checkout fleet-v4.30.1 and make, then, if I send a request with an empty certificate I get a 400 which is ideal:

~/fleet $ curl -XPUT --verbose -A "MDM-OSX/1.0 mdmclient/1640"  https://localhost:8080/mdm/apple/mdm
> PUT /mdm/apple/mdm HTTP/2
> Host: localhost:8080
> user-agent: MDM-OSX/1.0 mdmclient/1640
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 400
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< content-length: 12
< date: Tue, 02 May 2023 14:34:20 GMT
<
Bad Request
* Connection #0 to host localhost left intact
*

I can also see the exact same log:

level=info ts=2023-05-02T14:37:09.406038Z component=http-mdm-apple-mdm handler=cert-verify msg="error verifying MDM certificate" err="missing MDM certificate"

@rfairburn:

  1. do you know if it's possible to get more info about the request from our APM setup? more headers, query parameters or anything in the request body might help.
  2. do you see anything remotely suspicious on the logs around that time?

I will keep trying though, my current suspicion is that this might happen on a certain combination of parameters in the request that might cause the 500 error.

rfairburn commented 1 year ago

@roperzh Sent you a DM on Slack with credentials for a viewer account to elastic/apm and a link to the related request. I didn't see anything that stood log-wise surrounding this request when originally investigating.

xpkoala commented 1 year ago

@roperzh Could you update this with any new information when you get a chance, thanks!

mna commented 1 year ago

Don't think I have access to the APM dashboard for this case (EDIT: I do have access but that early May data has been purged by now), but could it be that the provided log line does not correspond to the 500 request, the issue of the screenshot? The screenshot shows some queries to the nano tables, but if the certificate was in error (as is the case in the log line), I don't think the code would make any DB table access (the middleware fails and prevents access to the handler which executes those queries).

My guess is that the 500 was raised by a later issue in the checkin or command handler.

roperzh commented 1 year ago

@georgekarrv AFAIK we haven't seen this again (Robert please correct me if I'm wrong) and:

should we close and re-evaluate this again if we get another report with a trace?

georgekarrv commented 1 year ago

I would agree, @rfairburn please re-open if we have more info

fleet-release commented 1 year ago

Certificate misplaced, Fleet now finds harmony, Errors are erased.