Have the file system signer read the cert files when it needs to in case they have changed

krmichelos commented 8 months ago

Issue #64 Fixes #64

Added file watching to update certificate contents when they change on disk so that subsequent requests after the original certificate has expired will still succeed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

krmichelos commented 8 months ago

@13ajay, @liwadman, or @dwmw2 are any of you an owner for this repo or know who is?

13ajay commented 8 months ago

Sorry for the late response - I've added some comments on the PR now. You can feel free to ignore the minor comments - I don't consider them to be blocking.

krmichelos commented 8 months ago

Overall, this looks good - thanks for the PR!

Not to discredit your work, but I'm wondering how useful making this change will be for customers. If the credential helper is used as a long-running process (update or serve), the shortest amount of time between credentials updates is about 10 minutes (shortest temporary credential validity is 15 minutes, and the credential helper will perform rotations five minutes before credential expiry). Since CreateSession calls don't need to be made very frequently, would it make more sense just to read the certificate and private key information right before making the CreateSession call as opposed to tracking changes in file data?

A little background for what led me here. I plan on using this as a sidecar container in some pods that are running outside of AWS but need to interact with some AWS services. We plan on using very short lived certificates, so I was testing with certs that were only valid for an hour and were reissued at 30 minutes to exaggerate what we intend and do some testing. That is what helped me notice what was happening. It certainly could be made to read certs before calling the CreateSession. I looked at doing that initially, the conclusion that I came to was that it would make a weird interaction between the credential management process and the Signer interface. The Signer interface was meant to be generic and there are implementations that don't read from files on disk (and certainly more implementations could be added) and the interface doesn't currently have the idea of refreshing the cert(s). I didn't feel comfortable modifying the interface and thus the implementations without input from maintainers or the ability to test the changes for windows or darwin. This coupled with the potential for an indeterminate number of extra reads for all signers, for users with certs that are valid for a year or two, that might cause occasional performance issues; I decided to implement completely inside the FileSystemSigner implementation. I chose to watch the files instead of just reading every time to avoid a significant number of unneeded disk hits.

That being said if you want to shift this to modifying the interface to have a read/refresh sort of function I can certainly go down that path. I would either need someone to fix the implementations for windows/darwin cert stores or at very least test and verify for me if I take a stab at changing those.

I will take a look at and address the other comments. Let me know how you want to proceed with the larger replaced certificates issue discussed above.

krmichelos commented 8 months ago

Is there any consideration of having an official image available for this helper?

13ajay commented 8 months ago

Thanks for explaining what led you to make this change. Regarding the issue of file watching vs. reading certificate and private key data right before calling CreateSession, I don't think I can make this decision myself, so I will speak with the rest of my team and get back to you within the next few days.

Is there any consideration of having an official image available for this helper?

I'll have to discuss this with my team as well. But in the meantime, to help with prioritization, could you +1 #51?

13ajay commented 8 months ago

So it could be that there's something I'm missing, but why does there have to be a modification to the Signer interface in order to support the refreshing of certificates or private keys specifically for the FileSystemSigner? Can't all of that logic just be encapsulated within the Sign (or Certificate, or CertificateChain) method(s)? Basically, just store the path to the private key and certificate files as members of the FileSystemSigner, and in the implementation of Sign, read whatever you need to read from the file system. You could make a similar modification to the Certificate and CertificateChain methods.

This client application also integrates with PKCS#11 and OS-specific certificate stores, and in each of those cases, the private key data is never cached by the credential helper process (it can't directly access the private key data). Instead, the application will search the secure store for the right private key each time the Sign function is called and delegate the signing operation to the secure store. So, if there is an update in private key data, that will be reflected in the call to CreateSession. Maybe, in a sense, you could view that as an inconsistency between the FileSystemSigner and the other Signers (since for the FileSystemSigner, the private key data is effectively cached for the lifetime of the process).

In the certificate rotation scenario, though, you would want the long-running application to call CreateSession with the right certificate (even if it has been updated after the start of the process) without having to restart the process. I think from a consistency perspective, what may make the most sense is to read the certificate from wherever it's supposed to be read from each time, right before calling CreateSession, regardless of the Signer that was used to sign the request. And maybe that means making the update that you suggested to the Signer interface. I suppose the other option is to do the refresh directly within the Sign, Certificate, and CertificateChain methods (as I mentioned for the FileSystemSigner in the first paragraph above), but these methods (at least in this codebase) are called in quick succession (right before the CreateSession call), and if implemented in this way, would probably need to do repeated work (maybe not for the FileSystemSigner though since there is a separate file for each of the private key, certificate, and certificate chain). For example, for the DarwinCertStoreSigner, certificates and private keys are stored together as identities. So, if you search for one of them, you effectively just get the other for free. But if you implement the searching logic directly within the Sign or Certificate methods, you'd be performing some wasted work when you call Certificate and Sign in quick succession. It may make more sense to implement a Refresh method that caches a bunch of information, so that the other methods that are implemented as a part of the Signer interface can use those cached results.

What are your thoughts?

krmichelos commented 7 months ago

I obviously don't know much about the OS cert store implementations. They looked to me like they were getting a reference to a cert once, but I will take your word for those references returning updated certs when necessary. So, if the OS cert stores don't have this issue when used in a long running process then we can just focus on the FileSystemSigner. I didn't read the code closely enough to realize that Sign and the other FileSystemSigner functions were only used when preparing to call CreateSession. If that is the case then the functions in FileSystemSigner could be made to read the files on request. That would be an extra 26-144 reads of 2-3 files a day for those with longer expiration periods on their certs. If that isn't a concern and you want this switched to have the FileSystemSigner read the files every time it is used let me know and I can switch it. If that is the case, unless I am missing something else, it seems like all the cached file contents should be removed from the FileSystemSigner and its internal state should only track the paths to the certs/chain since it will be changed to read them when requested. Just curious what is the concern with the file watching?

Let me know what you think.

13ajay commented 7 months ago

Sorry, I may not have been clear in my earlier message, but I believe the certificate data is cached for the lifetime of the process in some cases, even when using the OS certificate store implementations. But the private key data necessarily can't be cached (since that data isn't directly accessible by the application). I have to refresh my memory on how it works for the OS certificate store implementations, but I know that at least for the PKCS#11 signer, the private key is searched for again, each time it needs to be used to sign.

If, for other signers, the behavior is, "search for the private key data right before signing," I was thinking that the FileSystemSigner should do the same.

That would be an extra 26-144 reads of 2-3 files a day for those with longer expiration periods on their certs.

I think that's acceptable. You could have cases in which the file changes multiple times between CreateSession calls, and in that case, you would only care about the last update that was made to the certificate data. It's a design tradeoff I guess. In some scenarios, reading the files right before calling CreateSession is better, and in other cases, watching for file updates is better. I'm guessing that usually though, certificates won't need to be rotated that frequently, thus making the file watching approach more efficient.

Just curious what is the concern with the file watching?

One of my concerns includes error handling (what happens if the certificate file is deleted at some point but then reintroduced before the next CreateSession call). I don't remember how your code handles these issues, but I think in general, there's a bit more to reason about in a file watching implementation. In a sense, this can be considered an issue of consistency between the FileSystemSigner and the other Signers (but I do acknowledge that there isn't as much consistency between the implementations right now as there should be). And my other concern is maybe just an extension of the first that I raised. The code complexity becomes greater if we have to support file watching, as compared to reading the necessary files right before calling CreateSession.

krmichelos commented 7 months ago

@13ajay I have pushed a change to have the file system signer read certs on demand.

krmichelos commented 7 months ago

Thanks for implementing this!

Would you also be able to run go fmt on the files that you changed as a part of this PR? And if you have the time, could you also add unit testing?

Is there something specific in the formatting that you think is off? My IDE runs go fmt on every save. I am adding a .pre-commit-config.yaml just to be sure, but when I run it for all files the only formatting changes are in files I didn't change. I just want to make sure I address the concern. Also, I committed the files that it formatted.

What test case are you wanting. The only test that I see that is using a FileSystemSigner is TestBuildAuthorizationHeader and it isn't testing a long running serve type scenario. Are you wanting me to replace the contents of the cert and private key on disk and then duplicate lines 142-153 to test again?

13ajay commented 7 months ago

Is there something specific in the formatting that you think is off?

I thought that maybe you hadn't since I saw a diff that looked like this:

- func (fileSystemSigner FileSystemSigner) Close() {}
+ func (fileSystemSigner *FileSystemSigner) Close() {
+ }

But it's possible that your IDE formats in a slightly different way. Either way, I don't think it really matters - no need to worry about it.

What test case are you wanting.

There isn't anything specific that I had in mind, but I did want to make sure that this code gets tested in some way prior to us merging it. Would you be able to describe how you tested it? I suppose my team will need some way of doing integration testing prior to merging this change, which I can discuss with my colleagues. But I think it would also help to learn how you did testing for these changes yourself.

krmichelos commented 7 months ago

I thought that maybe you hadn't since I saw a diff that looked like this:

go fmt doesn't seem to care about the extra linefeed(s). It was just there because I added code to that method in the previous iteration and then removed it. I removed the linefeed.

There isn't anything specific that I had in mind, but I did want to make sure that this code gets tested in some way prior to us merging it. Would you be able to describe how you tested it? I suppose my team will need some way of doing integration testing prior to merging this change, which I can discuss with my colleagues. But I think it would also help to learn how you did testing for these changes yourself.

The testing I did is the same as my initial testing that allowed me to notice the issue. I built an image and added a sidecar to an external-dns pod to allow external-dns to interact with Route53. The pod running the code currently in the PR minus the removed linefeed and unneeded variable has been running for 3 days without crashing and continuing to allow its paired external-dns to function properly while the certificate is replaced every 30 minutes. I also modified the test we discussed to verify the certificate returned by the signer matches the original read from the file and then again after the file has changed.

michaelsauter commented 7 months ago

FWIW, I had the same need and quickly put together https://github.com/michaelsauter/rolesanywhere-credential-helper/commit/5bf1b5328630e690738dc7f804c94ccd2cc80fb3 before finding this PR. It differs slightly in that I did not make any change to any signer, instead just added a flag to refresh the signer before use. In general I think I prefer the change done by @krmichelos, I just didn't want to make so many changes :) Looking forward to have this PR merged!

13ajay commented 7 months ago

Sorry for the late reply.

I think this looks good, but we will have to do an internal review before this gets merged. I'll let you know when I have an update.