Put FPGA behind 1:1 NAT on AWS

bboston7 commented 4 years ago

The FPGA needs to be accessible via a 1:1 NAT, rather than the current L2TP bridge solution because the L2TP tunnel requires modification of the VPN host. Dylan suggested the following on Mattermost:

BTW, if someone wants to try 1:1 NAT.. you would assign a 2nd IP to the existing eni (let's say 192.168.0.200). Then on the F1 host:

iptables -t nat -A POSTROUTING -o ens3 -s 172.16.0.2 -j SNAT --to-source 192.168.0.200
iptables -t nat -A PREROUTING -i ens3 -d 192.168.0.200 -j DNAT --to-destination 172.16.0.2
iptables -A FORWARD -s 192.168.0.200 -j ACCEPT
iptables -A FORWARD -d 172.16.0.2  -j ACCEPT

untested.. but should be the general idea. You may need to add the IP address as a 2nd IP to the ens3 interface manually if AWS doesn't do it for you: ip addr add 192.168.0.200 dev ens3

Then from somewhere else connected to the same VPC, you should be able to get to the target by ssh'ing into 192.168.0.200

rwatson commented 4 years ago

@brooksdavis @jrtc27

Hi Galois folk: We've been quite specific that we need FETT researchers to have direct network access to the TA-1 host on FPGA, without packet transformations, so that kernel attack surfaces can be reached using packets as provided. To this end, we have recommended bridging between the TA-1 host and the local network, rather than NAT. Could you confirm that the proposed translation will not make any modifications to packets between the researcher and the TA-1 node?

bboston7 commented 4 years ago

@rwatson

This ticket is result of a discussion we had internally on Mattermost, which I'll paraphrase and then paste below so you can have the full context. Last week we implemented the bridge approach, but in doing so we discovered that AWS does not provide a full L2 interface as part of its virtual private cloud, which means it's not possible to set up a bridge in the conventional sense. To get around this, we implemented an L2-in-L3 (L2TP) tunnel. This accomplishes the goal of getting packets to the FPGA without any packet rewriting. However, it requires configuration of the other side of the L2TP tunnel on the machine hosting the Synack VPN. We decided this was undesirable, and that the 1:1 NAT was a more fair compromise. The 1:1 NAT will modify the source / destination fields of the packets, but not the ports. I get that neither solution is terribly satisfying, but the reality is that we need some compromise somewhere and there is no solution that will make everyone happy.

I'm CCing everyone from the Mattermost conversation here @rtadros-Galois @dhand-galois @kurthopfer @kiniry . Let's please find a solution to this soon so we don't have keep re-doing this.

Here is the full conversation:

Ramy Tadros We need to discuss this now. I am getting contradictory instructions. "Don't use NAT, let the researcher be able to manipulate the packets". "Don't bridge to a a jumpbox, Synack doesn't want a jumpbox". Great. Then what to do?

Kurt Hopfer Yes agree, Ealier in the discussion we had a jumpbox to isolate the network from other researchers. Synack did not want a jump box between the researcher and the target due to their software (wireshark or fiddler like I presume) that monitors packets back and forth between the target and the resercher

Ramy Tadros Can the jumpbox be the researcher's instance itself? If the word jumpbox is bad. We can use researcherInstance instead. And configure the bridge between the F1 host and the researcher's instance directly.

Kurt Hopfer Jumpbox implies a server that needs to be logged into first before logging into the target. Is this the case here or would this jumpbox be a a passthrough only?

Brett Boston There needs to be a machine somewhere on the same AWS subnet as the F1 host to connect to the tunnel to the FPGA. It could be researcher controlled though, no? And couldn't the packets be captured on that machine?

Dylan Hand I'll re-ask my question again from the other day - is there any documentation or diagram on what the topology is supposed to look like? It seems like that is the first step. What does Synack envision the setup to look like and then we can find a solution that reaches that. Or is that our decision and we need to tell Synack to work with it?

Joe Kiniry There is a network diagram that Kurt Hopfer produced that has been iterated over with Synack several times. Indeed, they didn't want us to introduce classical jumpbox (an EC2 instance that researchers would first login to in order to then ssh to Target). We are not doing that. Whether we NAT or bridge should not matter to Synack's VPN and secret-packet-capturing sauce. Synack researchers are given a Synack beachhead host and they are obligated to use it to connect to Target. I don't recall where that diagram lives right now; perhaps Kurt Hopfer or a PL (Andrew Bivin?) can point to it.

Brett Boston I think seeing a diagram will clear this all up, but from your description it seems like the Synack beachhead can just connect to the tunnel to solve this problem

Joe Kiniry Yes, from the Synack beachhead they just ssh to (or probe, in the case of P1's running FreeRTOS) Target's class C IP address.

Dylan Hand Isn't the "beachhead" equivalent to a jumpbox? we'd still need to initiate the tunnel on that beachhead OS to access the Target

Kurt Hopfer Here is the current Diagram of how all of the production account will look and what we landed on with regards to network architecture: https://app.lucidchart.com/invitations/accept/c109589d-d07c-42f0-9d9c-f25891eacf36 Synack has a "beachhead" essentially in that they have a s2s vpn sitting in the u-west-2 researcher vpc in the prod account

Dylan Hand I don't fully understand everything in this plot, but it seems like this assumes it's possible to join the Target OS directly to the VPC, which we now think is problematic (without NAT) Or not? They use the same icon for F1 env as F1 instance.. this part isn't making sense to me (what is the target vs host):

Image Pasted at 2020-6-19 12-15.png

Kurt Hopfer F1 Env is the F1 Instance Orange F1 is the host Blue is the FPGA Multiple F1s are being spun up on the subnets

Dylan Hand OK, so then the "Researcher" computer in the Synak data center is what the researcher is actually logged into?

Kurt Hopfer yes which is actually an AWS Workspace in their account

Dylan Hand How does the VPN present IPs from the private subnet to the researcher on that box? It looks like there's a translation between public subnet and private subnet

Kurt Hopfer Image Pasted at 2020-6-19 12-20.png

Dylan Hand Or is that just indicating the VPN server itself is running on a public facing network?

Kurt Hopfer We have a S2S vpn set up allow Synak to bridge our CIDR with their Workspace I have tested end to end connectivity from Synak researcher (I have a login) directly to FETT Target (behind NAT) The VPN server is the machine that has packet sniffing software

Dylan Hand OK, so the researcher box has direct access to the private subnet via the VPN

Kurt Hopfer So if we introduce a machine between the target and the VPN we would be placing the traffic in an SSL tunnel yes

Dylan Hand So if we want to use the bridge approach, it'd seem like we need to standup the other end of the bridge interface on the researcher OS directly. And then their packet inspection software would need to be able to handle/understand L2TP traffic Or we use some form of NAT. I'm not really seeing other options given this setup a 1:1 NAT (instead of our masquerading/port forwarding one) would at least pass all packets on all ports. But it still translates the src/dst. IMO, that's good enough but I don't know all the possible exploits that might prevent A third option is to retry the MAC cloning trick to put the VPC IP directly onto the F1 Target's interface. But I have no idea if that works. Given how VPCs work, I'm thinking we could also be overestimating the amount of maliciousness you can put into a packet and still get AWS to transfer it.

Kurt Hopfer I have been attempting to route traffic to a second ENI no success so far

Dylan Hand Yeah, that's part of what I termed option 3. You'd need to clone the MAC from the eni onto to target interface but if you're using a firesim AFI, that'll fail at the moment due to the MAC issue

Kurt Hopfer yeah, trying trying to route traffic is the pita AWS doesn't like not knowing the route tables. And all routing needs to reside on the F1 for it to be tenable

Dylan Hand IMO we should do a 1:1 NAT and be done with it. I think Jessica Clarke originally raised the objection to using NAT so perhaps figuring out if that type of NAT would alleviate the concern is a next step (given the context that bridging is not straight forward)

Kurt Hopfer NAT seems to be the best solution imo too.

Dylan Hand Otherwise I plan to take a closer look at the FireSim switch code to make the 3rd option I listed (VPC IP directly on the target) viable, but not sure when I can get to that

Brett Boston To make sure I understand this correctly, the 1:1 NAT solution is close to what we had before, with the difference that all ports are forwarded as-is?

Kurt Hopfer Not sure about all ports but for example if you ssh'd to 10.100.54.4 -p 10000 that would redirect to 172.16.0.2

Dylan Hand Commented on Brett Boston's message: To make sure I understand this correctly, the 1:1 NAT solution is close to what we had before, with the difference that all ports are forwarded as-is? Yes. It will translate all IP packets as-is (just changing the src/dst as necessary). It doesn't rewrite ports We would assign two IP addresses to the instance and one of them would be 1:1 NAT'd to the target. So it'd look like the bridge solution we want to exist (but doesn't) except it is using NAT underneath

Brett Boston OK. And where do these 10.x IP addresses come from? None of the F1s I've spun up on AWS have been assigned IPs in the 10.x IP space

Kurt Hopfer THe 10.x subnet is the subnet no one is using. The AWS Production Account is built in 10.x subnets (as well as all of the other developer AWS accounts) Just no one is using it except me

Brett Boston Ah, I see

Kurt Hopfer We are hosting FETT in multiple regions, so specifically 10.100 and 10.200 for oregon and n virginia respectively

Brett Boston OK. I think I understand this better now. Thanks. I'm fine with either the l2-l3 bridge approach or the 1:1 NAT approach. Thoughts Ramy Tadros ?

Kurt Hopfer So ultimately, for your sake, whatever the AWS Internal Ip is that you are using in the VpC you are testing will be a 10.x ip in prod

Joe Kiniry I hypothesize that the l2-l3 bridge will piss off Synack. I suggest the 1:1 NAT and, concurrently, as them about the l2-l3 bridge approach for FETT II.

Dylan Hand It seems like L2TP is off the table.. I am assuming we don't get a lot of control over the researcher instance or that Synack would be happy about inspecting L2TP traffic

Ramy Tadros Commented on Brett Boston's message: OK. I think I understand this better now. Thanks. I'm fine with either the l2-l3 bridge approach or the 1:1 NAT approach. Thoughts rtadros ? I am fine with any plan that make everyone's happy. Which I don't see happening tbh.

Dylan Hand BTW, if someone wants to try 1:1 NAT.. you would assign a 2nd IP to the existing eni (let's say 192.168.0.200). Then on the F1 host:

iptables -t nat -A POSTROUTING -o ens3 -s 172.16.0.2 -j SNAT --to-source 192.168.0.200 iptables -t nat -A PREROUTING -i ens3 -d 192.168.0.200 -j DNAT --to-destination 172.16.0.2 iptables -A FORWARD -s 192.168.0.200 -j ACCEPT iptables -A FORWARD -d 172.16.0.2 -j ACCEPT untested.. but should be the general idea. You may need to add the IP address as a 2nd IP to the ens3 interface manually if AWS doesn't do it for you: ip addr add 192.168.0.200 dev ens3 Then from somewhere else connected to the same VPC, you should be able to get to the target by ssh'ing into 192.168.0.200

Kurt Hopfer Commented on Brett Boston's message: OK. I think I understand this better now. Thanks. I'm fine with either the l2-l3 bridge approach or the 1:1 NAT approach. Thoughts rtadros ? I think NAT is the closest thing to that

Ramy Tadros From this convo, I think so. If Synack is happy, and you guys are fine with the setup, and we are able to do it. I think this should do it.

kiniry commented 4 years ago

Indeed @rwatson, as we agreed, unadulterated access to the Ethernet interface is really what we wanted, and what we tried to do. But that was prior to understanding (a) what AWS networking in their virtualized environment would permit, and (b) what the eventual bug bounty entity's VPN and network logging infrastructure would permit. In the end, it simply isn't possible to satisfy the constraints imposed upon us by both of these parties and have unfettered direct access to the device, as we desired. That being said, since Researchers have an ssh login to the Target host, I believe that they can craft a local attack on that interface through other means. As I stated to Keith et al. this morning, I really do not think that this will seriously limit Researcher's ability to attack target devices. Some small fraction might be good at crafting network attacks that manipulate facets of layer 0/1 network interfaces, but that's probably only a tiny handful of researchers and there are dozens of other things that they can try. AustinR stated that if, down the road in FETT, some Researchers really complain about this limitation, we can look into setting up an EC2 beachhead that will facilitate unadulterated attacks.

rwatson commented 4 years ago

Forgive a lack of familiarity with iptables: Am I right in thinking that the rules above unconditionally transform only the IP addresses (and implicitly the checksums), but don’t retain or enforce state / protocol transitions / etc?

dhand-galois commented 4 years ago

My understanding is that in the 1:1 NAT scenario all packets arriving at the Linux host with a destination IP matching the specified IP address will be forwarded to the FPGA with only the source/destination fields translated (as well as checksums that include those fields). You could write additional rules that would perform additional translations but we're only planning to add the ones listed earlier.

The approach of NAT w/ masquerading we were using previously did rely on tracking state, only applied translation to specific protocols, etc. 1:1 NAT should bring us closer to the goal of passing as much to the target as possible while staying within the constraints of our setup.

kiniry commented 4 years ago

In a call with Synack this morning we checked in with them (and DARPA, again), about whether or not they were comfortable with a simple NAT solution, whether it ran counter to their network infrastructure and observation, etc. They are completely comfortable with this solution. They do suspect that a couple of researchers may have techniques in their pocket to attack an Ethernet device at a lower level from a local subnet. We'll offer up to Researchers this option and, if someone argues that they want this capability, we'll explore setting them up an EC2 beachhead on the same subnet as the F1's interface.

GaloisInc / BESSPIN-Tool-Suite

Put FPGA behind 1:1 NAT on AWS #456