dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.46k stars 10.03k forks source link

Support certificate auto-rotation in Kestrel #32351

Closed aelij closed 1 year ago

aelij commented 3 years ago

Is your feature request related to a problem? Please describe.

In Kubernetes, certificates are mounted as secret volumes, which can be configured to update automatically when the cert is rotated (e.g. from Key Vault). To achieve auto-rotation in Kestrel today, we need to hook up ServerCertificateSelector and listen to file changes (e.g. using IFileProvider.Watch).

Describe the solution you'd like

Add a HttpsConnectionAdapterOptions.ServerCertificatePath property that would watch for file changes.

amcasey commented 1 year ago

No exciting updates today. I believe I've found a way to get events for folders (you have to move the watcher up a level and watch the folder from there), which it sounds like we'll need to make the k8s scenario work. I hope to get that implemented tomorrow.

amcasey commented 1 year ago

Interesting, it's still not working with DOTNET_USE_POLLING_FILE_WATCHER, however there's a new log line that hints it could be easy to fix:

trce: Microsoft.AspNetCore.Server.Kestrel.Core.Internal.CertificatePathWatcher[17]
      Ignored redundant event for '/certs/cert1.crt'.

This is interesting because it suggests that the file watcher concluded things had changed enough to merit an event, even though the last modified time was the same. In simple local (non-k8s) experiments, I was unable to repro this, but my guess would be that it's checking the size as well as the mtime (hashing seems too expensive). Conceivably, I could remove the restriction on same-mtime changes for symlinks, specifically, but I'd like to understand what's happening before I change that.

aelij commented 1 year ago

checking the size as well as the mtime

I'm not sure size would work for detecting PEM certificate changes; these are base64 encoded certificates with the same number of bytes.

hashing seems too expensive

These should be very rare operations, so IMO it's not a big issue. You can load the cert and compare thumbprints (which are hashes).

aelij commented 1 year ago

Also regarding DOTNET_USE_POLLING_FILE_WATCHER not being the default, perhaps explicitly set UsePollingFileWatcher and UseActivePolling when creating PhysicalFileProvider in CertificatePathWatcher?

amcasey commented 1 year ago

hashing seems too expensive

These should be very rare operations, so IMO it's not a big issue. You can load the cert and compare thumbprints (which are hashes).

I meant that I didn't think the polling file watcher we were building on would be willing to hash in the general case (without exposing any apparent knobs for controlling it).

Also regarding DOTNET_USE_POLLING_FILE_WATCHER not being the default, perhaps explicitly set UsePollingFileWatcher and UseActivePolling when creating PhysicalFileProvider in CertificatePathWatcher?

That's certainly on the table if we can't get the file system watcher implementation working, but (a) I'd like to understand why it's not working and (b) I'd like to honor DOTNET_USE_POLLING_FILE_WATCHER if possible (i.e. let the user decide whether they want polling file watching).

amcasey commented 1 year ago

I spent basically the whole day finessing the code that keeps the symlink graph up-to-date as file and endpoint changes come in. I'm pretty sure it works now (modulo expected test failures I haven't bothered to clean up), but the change is becoming increasingly unwieldy.

Tomorrow, I'll explore the possibility of forcing polling and ditching timestamps in favor of SHAs. That may be more palatable, risk-wise, for 8.0 and then we can flesh out the fancy one in 9.0 (assuming there's still demand). In any case, we'll at least be able to compare them.

amcasey commented 1 year ago

Interesting, it's still not working with DOTNET_USE_POLLING_FILE_WATCHER, however there's a new log line that hints it could be easy to fix:

trce: Microsoft.AspNetCore.Server.Kestrel.Core.Internal.CertificatePathWatcher[17]
      Ignored redundant event for '/certs/cert1.crt'.

I'm wondering if this was for an actually redundant change event. Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

amcasey commented 1 year ago

I don't like it, but the polling/checksum version is here: https://github.com/dotnet/aspnetcore/pull/50172

I'll do more validation tomorrow to better understand the limits of each approach.

jozkee commented 1 year ago

Expose symlink information from, e.g., IFileInfo.

@amcasey what kind of symlink information do you need? Getting the target of a symlink in order to query its metadata should be possible with FileSystemInfo.ResolveLinkTarget.

Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

Yes, I assume that's probably to make polling less costly. Were you expecting it to work in a different way like comparing the contents?

Interesting, it's still not working with DOTNET_USE_POLLING_FILE_WATCHER.

@aelij I tried to look at your repro https://github.com/aelij/kestrel-aks-kv-rotation but it is too minimal to see how the files are polled and I'm not very familiar with the internals of ASP.NET Core. Am I understanding correctly that there should be PhysicalFileWatchers for /certs/cert1.key and /certs/cert1.crt? Can you point me to the source where they are used? I'm trying to understand if this is really a bug in the dotnet/runtime.

amcasey commented 1 year ago

Expose symlink information from, e.g., IFileInfo.

@amcasey what kind of symlink information do you need? Getting the target of a symlink in order to query its metadata should be possible with FileSystemInfo.ResolveLinkTarget.

That's at the System.IO layer, but we're working at the Microsoft.Extensions.FileProviders layer. We don't want to dig down in case the backing implementation isn't a disk.

Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

Yes, I assume that's probably to make polling less costly. Were you expecting it to work in a different way like comparing the contents?

No, that's what I expected. Seeing the redundant event message (which indicates that the new and old mtimes were the same) made me think there might be some other criteria.

Interesting, it's still not working with DOTNET_USE_POLLING_FILE_WATCHER.

@aelij I tried to look at your repro https://github.com/aelij/kestrel-aks-kv-rotation but it is too minimal to see how the files are polled and I'm not very familiar with the internals of ASP.NET Core. Am I understanding correctly that there should be PhysicalFileWatchers for /certs/cert1.key and /certs/cert1.crt? Can you point me to the source where they are used? I'm trying to understand if this is really a bug in the dotnet/runtime.

Sorry, it's a long thread, but there are some notes above. The file layout is described here and the watch is on (in that example), tls.key.

(Note that @aelij, who has been exceptionally helpful, is on European time and is, hopefully, enjoying the weekend.)

davidfowl commented 1 year ago

What we need is watching files that are actually sym linked to trigger the change token. It needs to work like a normal file change. I'm not sure we need to know if a file is a sym link or not, that might be useful in general but for this, it should just work if I watch a file and that file changes and it happens to be a link. If that behavior needs to be opt-in, it should be opt-in, but the complexity should be hidden in the watch implementation.

jozkee commented 1 year ago

Seeing the redundant event message (which indicates that the new and old mtimes were the same) made me think there might be some other criteria.

That could be https://github.com/dotnet/runtime/issues/55951 if it is executing fast enough, i.e: in milliseconds. But if this was tested manually then is unlikely.

> What we need is watching files that are actually sym linked to trigger the change token

Well that was addressed already on https://github.com/dotnet/runtime/pull/55664 unless its buggy. EDIT: I think you are to referring to what you want in aspnetcore 😅.

pinkfloydx33 commented 1 year ago

Why does the reload work for normal AppSettings files mounted into a container in the same manner, but not in this case?

jozkee commented 1 year ago

@pinkfloydx33 that's what I'm trying to understand too.

amcasey commented 1 year ago

I'm wondering if this was for an actually redundant change event. Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

Figured this out over the weekend: the change event is for the target of the link and the timestamp check is on the link itself. (If there's some way to pull the path out of the event, I haven't found it, so I'm associating a path with each change token manually.)

I'd like to update the mtime check to consider the target-of-link mtime as well, but (AFAICT) IFileInfo doesn't expose ResolveLinkTarget so that's not possible. However, if we force it to use polling, we should be able to drop the mtime check entirely, since it doesn't send redundant events like FileSystemWatcher.

amcasey commented 1 year ago

Why does the reload work for normal AppSettings files mounted into a container in the same manner, but not in this case?

Where do those files sit in the directory hierarchy you listed above? Is there an appsettings.json next to each tls.key?

pinkfloydx33 commented 1 year ago

In that particular example no, it's just the key.

But generally that's exactly how I mount AppSettings files in a container (at /app/settings but same concepts apply). If I terminal into the pod and edit one of those files and look at the logs I can see my yarp gateway refresh within seconds.

amcasey commented 1 year ago

If the app is finding its appsettings.json using a path that doesn't require symlink resolution, I'd expect it to just work. Alternatively, have you enabled polling file watching?

amcasey commented 1 year ago

This thread is getting fairly long, so I'll attempt to summarize our current status.

Problem

The appsettings.json file for an aspnetcore app contains the path to a certificate file. We would like to reload endpoints using that certificate file if it changes on disk, even if there's no corresponding change to the appsettings file. The reload behavior will be the same as if the appsettings file had been updated with a new certificate path.

File Layout

app/
    tls.key -> ..data/tls.key
    ..data/ -> 2023_08_14/
    2023_08_14/
        tls.key
    2023_08_15/
        tls.key

That is, tls.key is a symlink to ..data/tls.key, which ultimately resolves to 2023_08_14/tls.key. (The actual timestamped directory is more precise - this is just an example.)

The file change we expect to see in practice is an update of ..data/ to point at 2023_08_15/, rather than 2023_08_14/.

Status

At present, aspnetcore adds a file watcher (either FileSystemWatcher or polling, according to the user's environment variables) to tls.key. There's no globbing - it's just that single file.

When using FileSystemWatcher, symlinks are not considered at all. When the ..data/ symlink is updated, the watcher produces no events and aspnetcore does not reload.

When using polling, the mtime of the resolved symlink target (i.e. 2023_08_14/tls.key before the change and 2023_08_15/tls.key afterward) is used. However, to compensate for duplicate events in the FileSystemWatcher case, aspnetcore drops events that don't update the mtime of the watched file. Unfortunately, as far as aspnetcore knows, it is only watching tls.key so that is the only mtime it checks. IFileSystem does not expose ResolveLinkTarget, so there's no way to consider symlinks.

As a result, aspnetcore does not reload the affected endpoints in either mode.

Ways Forward

Regardless of which path we take, I'll need to update the code to stop instantiating PhysicalFileWatcher directly since, as @davidfowl pointed out, we should be retrieving file watchers from the IHostEnvironment.

Option: Do Nothing

If we make no further changes (beyond the fix mentioned above), things won't work on k8s, which was the motivating scenario for this work. It will, however, result in a low-risk change with intelligible behavior.

Option: Adjust the mtime checks

We could detect that the user has opted for polling file watching and skip the mtime check in that case. That would be enough to unblock the k8s scenario, but would likely increase CPU usage by making all file watching poll-based.

Option: Remove the mtime checks

If we remove the mtime checks but allow and/or default to FileSystemWatcher, we will see redundant change events that cause unnecessary reloads. It's possible that debouncing at some other layer will mitigate this.

Option: Switch from mtime checks to SHAs

Obviously, this would use a lot more CPU. We can limit this by combining it with polling (i.e. so we hash at most every 4 seconds). Given that IFileProvider didn't feel the need to do this, I'm assuming it's overkill.

Draft: https://github.com/dotnet/aspnetcore/pull/50172

Option: Implement symlink support in kestrel

We have to (a) move watchers higher up the file hierarchy to detect changes to directories earlier in the path and write a bunch of abstraction-violating and error-prone code. It's doable but ugly and, as @davidfowl points out, at the wrong layer.

Draft: https://github.com/dotnet/aspnetcore/pull/50074

Option: Implement symlink support in the runtime

As complicated as the kestrel code is, it gets to make a bunch of simplifying assumptions. In the general case, we'd have to think about globbing and how to get a new IFileProvider to pointed-to files outside the scope of the current IFileProvider.

The biggest risk here is that we're quite late in the 8.0 cycle, so there wouldn't be a lot of time to let this bake. With so many file systems to consider, there's quite a large test matrix and there's likely to be a bug tail as we discover special cases.

Proposal

Personally, I think the most pragmatic solution would be to force polling for precisely certificate files (i.e. continuing to respect the user's environment variable for other files, including appsettings.json) and dropping the mtime check. In fairness, I should point out that @aelij suggested this long ago.

As mentioned above, we probably want to start pulling from IFileProvider regardless of which approach we select and I think this would be worthwhile. Frankly, the biggest change is likely to be to the tests.

Thoughts?

adityamandaleeka commented 1 year ago

Thanks for the clear summary @amcasey. I support your proposal for .NET 8.

Longer term, improving symlink support in the framework is likely the best option but it's too late in this release for that as you noted.

I'd also be interested in whether removing the mtime checks actually works well, but would want us to understand all the debouncing logic fully if we want to go that route with confidence.

amcasey commented 1 year ago

There's little advantage to dropping the mtime check without also forcing polling since FileSystemWatcher won't resolve symlinks. I guess the chief advantage would be being able to respect the polling environment variables.

While we're talking about runtime improvements, it would be nice if the non-polling watcher just didn't send duplicate events. :wink:

jozkee commented 1 year ago

it would be nice if the non-polling watcher just didn't send duplicate events. 😉

https://github.com/dotnet/runtime/issues/24079

amcasey commented 1 year ago

Regardless of which path we take, I'll need to update the code to stop instantiating PhysicalFileWatcher directly

I made this change here. Unfortunately, I don't think we'll be able to take it for 8.0 because it will prevent us from using polling for specific files (the file watcher is shared by several consumers). Not having it is an abstraction violation, but fixing it doesn't fix the abstraction violation because we still pass the path directly to X509Certificate, rather than retrieving a stream from the file provider.

amcasey commented 1 year ago

PR for polling and ignoring mtime: https://github.com/dotnet/aspnetcore/pull/50251

amcasey commented 1 year ago

It took a bit of massaging, but I got @aelij's repro script working (he was naively taking for granted that I would know when and how to log in to azure 😆). With #50251, I'm seeing

trce: Microsoft.AspNetCore.Server.Kestrel.Core.Internal.CertificatePathWatcher[15]
      Flagged 1 observers of '/certs/cert1.crt' as changed.
...
info: Microsoft.AspNetCore.Server.Kestrel[0]
      Config changed. Stopping the following endpoints: 'https://*:5001'
info: Microsoft.AspNetCore.Server.Kestrel[0]
      Config changed. Starting the following endpoints: 'https://*:5001'

🥳

amcasey commented 1 year ago

@aelij I notice that your repro has the certs in /certs, which isn't under the /app, the content root. Is that the way things are normally laid out or was that just simpler for a toy repro? I don't believe it will cause a problem, but it suggests that a "fix" like #50246 would break people.

pinkfloydx33 commented 1 year ago

I don't think there's "a usual". For example at my job by convention we put things into /app/config, /app/secrets and /app/certs. But it could've been anywhere. I know some folks who just mount to /config while the program still runs from /app.

IOW mounting k8s certs within the content root is certainly possible but is no way guaranteed. The mount point may be dictated by devops, and/or those responsible for managing configuration and might not even be a developer concern.

aelij commented 1 year ago

Yes, there should be no relation to the app root.

amcasey commented 1 year ago

It's in! @aelij I've already validated using your sample project (thanks again!), but it would be great if you could confirm that an upcoming nightly works for you.

amcasey commented 1 year ago

Not sure why merging #50251 didn't close this...

aelij commented 1 year ago

I can confirm it's working. Thanks @amcasey!