GaloisInc / BESSPIN-CloudGFE

The AWS cloud deployment of the BESSPIN GFE platform.
Apache License 2.0
2 stars 2 forks source link

ensure LMCO can do on-premises SoC builds #116

Open kiniry opened 4 years ago

kiniry commented 4 years ago

Our goal is to ensure that LMCO can do reliable on-premises builds of both FireSim platform variants of CloudGFE as well as bitstreams for VCU 118s. Our build process relies upon our docker images to guarantee a deterministic build environment.

Our SSITH-private images are hosted in Galois's Artifactory server. https://artifactory.galois.com/ In particular, they are in the BESSPIN docker repository https://artifactory.galois.com/ui/repos/tree/General/besspin_docker-local

While we have given all TA-1 teams authentication credentials for getting to Artifactory for obtaining Nix binary cache data, we should probably give LMCO new/different credentials for Docker image access. I'll file an issue now with IT about current and new external user access, as I only have permissions to add internal users.

CC @dhand-galois to provide support on the build process post-LMCO obtaining the image.

CC @rfoot @Abivin12 for PL cover. I may need a performer to chase this today.

glanvild commented 4 years ago

We do have credentials because we need them for BESSPIN but when we run the ./start_docker.sh it says that it doesnt have the docker image locally, so based on the instructions it says its going to go out and get it but then we get an Authentication error. I am assuming this is because it doesn't know where to get those credentials from. does that make sense?

dhand-galois commented 4 years ago

@glanvild We authenticate docker to artifactory using this command:

docker login artifactory.galois.com:5008

It'll ask for the credentials and store them. Then subsequent docker commands that rely on images stored there should work. However, I am not sure if the credentials you already have will enable access to the particular docker images. @kiniry's ticket with IT should solve that if not.

glanvild commented 4 years ago

So we were able to use the command that @dhand-galois posted above and it says our credentials were able to connect but when we reran the start docker script it stills says authentication required.

kiniry commented 4 years ago

@podhrmic are you able to work through this with LMCO? Or should we find someone else to do so?

podhrmic commented 4 years ago

Based on the recent mattermost conversation, it sounds like it is working for you and we can close this as resolved @glanvild ?

glanvild commented 4 years ago

not quite. we have ran into another issue. it seems to be looking for the firesim.pem. I know this is a key that has to be created on AWS but is this a key we can create ourselves or is it looking for a key your team created?

dhand-galois commented 4 years ago

Which step are you running that requires that key? firesim managerinit?

I thought I had removed all the dependencies on that, but perhaps not. You can give it a key you've already created or likely just create an empty file even. The key file is not used for on-premise builds.

kiniry commented 4 years ago

ping to LMCO and Galois performers. We still do want to ensure that you can eventually do full on-premises builds. I see the discussion from MM last week with input from @dhand-galois, but I do not see an acknowledgement for next steps or that this issue is resolved. I see that @dhand-galois suggested that we can make some small changes to disable unnecessary AWS interactions during the build flow. Is that what we are intending to do?

dhand-galois commented 4 years ago

I've checked in the changes to both cloudgfe and cloudgfe-lmco branches of firesim.

The README has been updated as well, but to summarize:

s3bucketname=firesim-localuser buildinstancemarket=ondemand spotinterruptionbehavior=terminate spotmaxprice=ondemand postbuildhook= enableaws=false


* When running `firesim buildlocalafi`, the build process will stop after generating the final checkpoint file that is normally set to AWS. It will print out the next steps you need to take to finalize the AFI.
  * Copy the specified tar file to a S3 bucket
  * Run the `aws ec2 create-fpga-image` command with the supplied options
* You can then build the `sw.tgz` style packages I've been supplying by:
  * Updating `deploy/config_hwdb.ini` with the new AGFI
  * Running `firesim buildlocalsw`
* It will again stop before submitting the final tgz to AWS and instead print out the location for you to copy manually.
kiniry commented 4 years ago

Thank you @dhand-galois!