aws / aws-ec2-instance-connect-config

This is the ssh daemon configuration and necessary EC2 instance scripting to enable EC2 Instance Connect. Also included is various package manager configurations for packaging for various Linux distributions.
Apache License 2.0
83 stars 35 forks source link

Please cache results #15

Open cpaelzer opened 4 years ago

cpaelzer commented 4 years ago

Hi, I was reviewing the general concepts of EIC a while ago and wanted to now file this report for discussion here. Because one thing I started to wonder was about was what would happen if you have e.g. remotely driven automation that might have hundreds of ssh calls per second.

Obviously one could say "push a script to the system and execute that" but many automation solutions just don't work that way. In that context EIC will work like an amplifier which means every of those ssh logins will trigger a multitude of curl calls each adding latency and overhead.

I was wondering if it would seem reasonable to you to rate-limit this. You could use timestamps and only re-check everything once every x seconds.

The first login won't find a timestamp and has to work it out, but every later login for some time doesn't need to do the same work over and over again. That could help scalability and drop overhead a lot at almost no loss IMHO.

I have not found a "I already got my Auth-data, fast-path-skip" in the code - if there is one that I missed please just let me know and consider this almost resolved :-)

LordAlfredo commented 4 years ago

Thank you for the request. I will bring this up with our product management, but I would not get my hopes up. There are two angles to consider why not: security and product goals.

I will avoid getting too deep into the system threat model, but as part of key verification at some point the active key list must be processed. By doing all of this within the scope of the ssh daemon's memory, there is practically nothing for malicious software to manipulate - it would have to crack the daemon process memory to add an undesired key, which would mean your system would have to already be totally compromised. On the other hand, a cache introduces a new potential attack surface.

As for product goals, the main focus of EC2 Instance Connect is

In an absolutely perfect world, we would not be doing the key timestamp piece that you've noted. Instead, a key would be trusted by the ssh daemon once and then never again (unless it was published through EIC a second time). The problem is, it turns out doing this is incredibly complex - if you check the instance's auth logs, you can even see that the ssh daemon pulls the set of available ssh keys multiple times. It's much more nuanced than just "trust this specific request ID once" and would either require a full-featured sibling daemon for sshd to hook into or would require deep changes to sshd itself. The 60 second expiration is an approximation for single-session scoping without needing to make these deeper, riskier changes to the ssh daemon (60 seconds in particular was chosen as sufficient time for all parts of the ssh handshake to complete in all testing).

cpaelzer commented 4 years ago

Thanks for the Answer @LordAlfredo - I can see the Threat Model POV here. Maybe it can be a long term goal implemented inside the ssh daemon itself (or plugin, or sibling daemon, or maybe even a pam module or something like it) which could grant the benefits of reduced overhead/latency while at the same time not adding the same additional attach surface that an on-disk cache of any kind would do.

raharper commented 4 years ago

What about making use of the kernel keyring to store the session data and timestamps needed to implement a cache?