Open kurthopfer opened 4 years ago
AWS Architecture attached for discussion @rfoot @Abivin12
Adding @lolsborn and @matthewj1 for visibility
The only quesiton-mark for me would be the SSO solution and we'll bring that up in our meeting with Synack tomorrow.
This is the VPN IP range that the Synack VPN network would use to access the contest VPCs 52.205.190.0/24 35.245.67.224/27
Agree - I'll adjust that once we know. As for the IPs. My thought was the portal vpc and the contest vpc would be different. Synack would not have access to the portal vpc directly except through the UI. Synack would have access to the contest vpc via the bastion host, which would then be locked down (either via a NACL or a security group) for those cidrs.
Alternatively, we could rip cloudfront out of the picture, and instead, use an s3 bucket instead that only allows traffic from specific CIDRs to access. This would lock down the portal significantly if need... Ultimately, this would actually make things tricky for non-synack users to access the portal directly as well unless we also allowed a Galois VPN (if it exists) to also access?
Let's discuss this further. Ultimately the question is do you care if the portal login page is accessible from the public internet or not?
It sounds like, given that we'll have to support multi-region deployments, ripping out CloudFront is not a wise idea, correct?
FETT Portal architecture should remain the same and be completely hosted out of us-west-2 (Oregon). What will change is the RAF (Researcher Automation Framework) which will need the ability to provision and message with environments in different regions back to FETT Portal. From a FETT Portal perspective, the biggest lift will be figuring out which region each environment will be provisioned. Which should all live in the app layer so as to allow for a region agnostic research environment paradigm. I will update our diagrams to take this into account.
CI should not be impacted.
Oh and Cloudfront, the main question.... yes that will stay :)
Per our call, I hypothesize that we'll definitely have to be ready to use us-east-1 >> us-west-2 >> us-east-2 >> us-west-1. My main concern is ensuring we have quotas in place for the production organization/account. I asked @rfoot to ensure that the Organization diagram included current and desired vCPU F1 quotas on it at all times.
@kiniry agreed - I put a deadline on having a concise plan of attack for this no later than Thursday next week. I asked @majamison to follow up with you and discuss our initial thoughts for next steps. If you need to discuss live let me know. I am working on drafting out an email to Taylor to confirm our plan.
The BIGGEST unknown I have is from a networking perpsective. If I am on a host in us-west-2 and I want to ssh into an FPGA (in us-east-2) that is using a virtual network layer using the firesim or virtio toolset is this going to cause problems? If so, how do we circumvent that. Ultimately, we need to setup a quick POC in dev to make sure this will all work using VPC peering and routine into the FPGA.
Exactly, setting up a POC is what I just emphasized to @majamison in a message minutes ago.
Great - we are in sync then. Who should we work with on the CloudGFE side to get that F1 host set up
@dhand-galois can spin-up one or two F1 hosts for POC experimentation. I suggest that we ensure we have both a Linux and a FreeBSD instance stood up for experimentation.
Thanks Kurt – can you let me know when that arch doc is done so I can schedule a meeting to go over it with AWS – thanks!
@dhand-galois can spin-up one or two F1 hosts for POC experimentation. I suggest that we ensure we have both a Linux and a FreeBSD instance stood up for experimentation.
I can do this for FireSim - created https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/86 to track it, but there are some lingering issues. More specifically, FreeBSD is still very slow to boot (see https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/81) and only the FireSim-provided Linux image / GFE busy box images are booting. The GFE version of Debian gets stuck. FireSim has a fedora image, but I have not tested it yet.
@kiniry @majamison I have created a simple environment on the AWS F1 Dev account: 1 VPC (vpc-0ef9df0cd8e270fee 10.0.24.0/22) in Oregon 1 VPC (vpc-0a3b2bb1df33b87d2 10.1.0.0/16) in N. Virginia I have added VPC peering and corresponding routes between the two regions and have succesful communication over ssh.
@dhand-galois can you add an F1 host in N. Virginia in vpc-0a3b2bb1df33b87d2. I added routes to all 4 subnets so feel free to choose.
Reopening since we are still iterating.
@rfoot the architecture is done and is available here to view: https://app.lucidchart.com/invitations/accept/c109589d-d07c-42f0-9d9c-f25891eacf36
@dhand-galois can you add an F1 host in N. Virginia in vpc-0a3b2bb1df33b87d2. I added routes to all 4 subnets so feel free to choose.
I've posted the setup steps here, at least for Linux so far. Let me know if you need any assistance getting it working: https://gist.github.com/dhand-galois/9c41af3c10cb9cea2daf2ae1c9e2deed
@dhand-galois awesome! I’ll need to swizzle some ips to support routing but I’ll give this a shot and let you know how it goes. Thanks
@kurthopfer - Discussion at standup 05.20.20 was that this was complete. (I made the edit at that time in GitKraken, so it didn't reflect in GH) Is this the final form? https://github.com/DARPA-SSITH-Demonstrators/SSITH-FETT-Portal/issues/24#issuecomment-629071355 Can you confirm that we're done here, please?
Agreed at end of sprint demo to move this out of sprint to distinct milestone.
I am presuming that this/these diagrams need to be updated post-deployment of FETT to reflect that final architecture. Thus, I'm now moving this issue into this coming sprint, as FETT kickoff is mid-week.