ad-build-test / BuildSystem

Testing buildsystem repo
0 stars 0 forks source link

Build and deploy build results for an IOC app - Remote/Production #11

Open pnispero opened 1 week ago

pnispero commented 1 week ago

Build and deploy build results for an IOC app - Remote/Production

Building build results

  1. Refer to issue #7 for the building part.
    • [x] In addition we need to add this building logic to the build container as well, so when backend is triggered by a pull request to start a build, it will do the existing build logic, but need to add the packaging of the build results. Working as of commit 3f0cc045

      Deploying build results

  2. I think we can utilize a deployment configuration file, specifying each IOC and what version of the app it should run. (Actually thats only for prototyping, once thats done then we'll add the deployment structure to the deployment database instead)
  3. Branch off of main because merged issue #10 since it has working functionality of deploying of an ioc app with multi/single ioc, and we can reuse that for remote deployment.
  4. Want gatekeeping logic for deploying to production, specifically only deployment allowed when PAMM is scheduled for the cater attached to the deployment
  5. Want a deployment image that has ansible installed and a deploy script. See more details in the comments.

TODO Tasks:

PLAY [all] *****

TASK [Gathering Facts] ***** ok: [localhost]

PLAY [Initial IOC Deployment] **

TASK [Create component directory at /sdf/group/ad/eed/lcls/epics/iocTop/test-ioc] *** changed: [localhost]

TASK [Create ioc directory at $IOC /sdf/group/ad/eed/lcls/epics/iocCommon/] *** changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocCommon - test-ioc-1) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocCommon - test-ioc-2)

TASK [Create sym link /sdf/group/ad/eed/lcls/epics/iocCommon//iocSpecificRelease to point to /sdf/group/ad/eed/lcls/epics/iocTop/test-ioc/] *** changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocCommon/test-ioc-1/iocSpecificRelease -> /sdf/group/ad/eed/lcls/epics/iocTop/test-ioc/test-ioc-1.0.0) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocCommon/test-ioc-2/iocSpecificRelease -> /sdf/group/ad/eed/lcls/epics/iocTop/test-ioc/test-ioc-1.0.0)

TASK [Create ioc directory in /sdf/group/ad/eed/lcls/epics/iocData/] ** changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData - test-ioc-1) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData - test-ioc-2)

TASK [Create multiple data directories in /sdf/group/ad/eed/lcls/epics/iocData/] *** changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - archive) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - autosave) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - autosave-req) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - iocInfo) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - restore) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-1 - yaml) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - archive) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - autosave) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - autosave-req) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - iocInfo) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - restore) changed: [localhost] => (item=/sdf/group/ad/eed/lcls/epics/iocData/test-ioc-2 - yaml)

TASK [Add startup.cmd/st.cmd for the IOC at /sdf/group/ad/eed/lcls/epics/iocCommon/] *** failed: [localhost] (item=== ADBS == test-ioc-1) => {"ansible_loop_var": "item", "changed": false, "item": {"key": "test-ioc-1", "value": "test-ioc-1.0.0"}, "msg": "== ADBS == UNFINSHED. Please do this step manually."} failed: [localhost] (item=== ADBS == test-ioc-2) => {"ansible_loop_var": "item", "changed": false, "item": {"key": "test-ioc-2", "value": "test-ioc-1.0.0"}, "msg": "== ADBS == UNFINSHED. Please do this step manually."}

TASK [== ADBS == Continue despite previous error] ** ok: [localhost] => { "msg": "Startup.cmd automation is unfinished, please create the startup.cmd manually." }

PLAY [Deploy app, and update envPaths] *****

TASK [Extract build results to '/sdf/group/ad/eed/lcls/epics/iocTop/test-ioc'] *** changed: [localhost]

TASK [Update envPaths (call cram script for this) '{{ facility }}'] **** changed: [localhost]

PLAY [Normal IOC Deployment] ***

TASK [Update sym link /sdf/group/ad/eed/lcls/epics/iocCommon//iocSpecificRelease to point to /sdf/group/ad/eed/lcls/epics/iocTop/test-ioc/] *** fatal: [localhost]: FAILED! => {"msg": "The conditional check 'user_src_repo == None' failed. The error was: error while evaluating conditional (user_src_repo == None): 'user_src_repo' is undefined\n\nThe error appears to be in '/build/ioc_module/normal_ioc_deploy.yml': line 18, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: 'Update sym link {{ ioc_link_folder }}//iocSpecificRelease to point to {{ ioc_release_folder }}/{{ component_name }}/'\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n"}

PLAY RECAP ***** localhost : ok=9 changed=7 unreachable=0 failed=1 skipped=0 rescued=1 ignored=0

[WARNING]: Cannot set fs attributes on a non-existent symlink target. follow should be set to False to avoid this. Playbook execution finished with return code: 2 $ sleep to keep container alive for debug pnispero@PC100942:~$


Note - you can ignore the update sym link fail, this is because I am running the playbook in the local container itself not on s3df, so its missing some filepaths. But the point is, build results was able to retrieved through artifact api, and an ansible playbook can be called with the right arguments in a container.
- [ ] Work with Claudio/Jerry to update deployment database to include an ioc dictionary {ioc:tag} as one of its fields, this way on the deployment, just query the database for the components ioc deployment configuration. Make it a new issue on core-build-system.
- [ ] Work with Claudio to get a new api POST endpoint for backend to take in remote deploy requests, and it'll start your deploy image. Make it a new issue on core-build-system.
- [ ] Contact someone to get `adbuild` user space and login for build system to use for deployment. This `adbuild` user will have to be on all facilities. And this would be needed for SSH access from ephemeral deployment containers.  

## Other notes
- need to get approval check on the tag? **See relevant questions and answers on comment below**
 - the process should be very similar to deployment on dev, one difference is updating the 'current' symlink for the app I believe
`cram ls` 
![image](https://github.com/user-attachments/assets/a3886fc3-07e7-41e7-bbb2-17dbdf04acf3)
**What the ioc's actually point to**
![image](https://github.com/user-attachments/assets/1779869f-c72a-47f8-ad49-ea8312e231a7)
**What `current` points to**
![image](https://github.com/user-attachments/assets/3f121e44-9212-4c33-a1c3-fedc60bd69eb)
- So it seems that cram upgrade will update the ioc to point to whichever tag you tell it to. And cram push will push contents and update the `current` link to the release you just pushed. And if you cram upgrade an ioc to that tag, itll point the ioc to `current`. We can keep this logic, or may omit and have the iocs point to the tag regardless of its 'current'
pnispero commented 1 week ago

Question 1 How is remote deployment going to work? in this case remote means pushing to anywhere but s3df dev, so like lcls, or facet.

User goes through CLI, CLI parses the deployment manifest, then CLI won't call the playbook directly like local deployments, instead

  1. CLI checks to see if the tag that the user wants to deploy has been approved? but assuming tags are made automatically after a pull request has been approved, then should the CLI still check? how would it check?

    • A: Tag should be created automatically after passing code review and testing. No logic needed to check for approval in this case.
  2. CLI will api call to backend to start a container as 'adbuild', and that container will have the logic to api call to artifact api to get the build results (tarball) for specified component/tag. the container then calls the ioc deployment playbook.

    • A: container is good route because don't want to load backend with more work. In this case we want the backend to check when the installation container is done, and if successful. And the backend should check if its taking too long, or crashes, then notify user failure.
    • Write a deployment container image for just this purpose (use rocky9 base)
  3. OR same as option 2, but instead of a container, the backend can do the steps

    • A: possible alternative
  4. OR the CLI look into the artifact storage itself and grab it. (This would be duplicating logic, one at CLI and one at artifact api, no good).

    • A: Not a good option

Question 2 for the deploy configuration, is that something that will be pushed to the repo? or just user space? if someone wants to change the versions for iocs in an app, would they need to make changes to the deploy config, then push this change, and go through build system and tag a new version, when nothing actually changed in code? should we have a 'current' version that users can specify?

pnispero commented 1 week ago

Update - Were going to deploy remotely using containers

  1. Since we're going the container route, I was looking to see if SSH will be a problem connecting the container (control node) to the remote machines (managed nodes - lcls, facet, etc.)
    • But assuming we have a user made for us on s3df and the managed nodes maybe adbuild. Then we can create the ssh key one time for adbuild.
    • Then mount the public key to the managed nodes adbuild, and anytime we run the deplyoment container, we will mount the private key (which we can keep in k8s vault). That solves SSH into remote machine.
  2. Found out that containers as control nodes for ansible is a common thing, and ansible seems to have made an execution environment which is basically ansible in a container https://docs.ansible.com/ansible/latest/getting_started_ee/index.html. So looking to use that as the base image instead of making our own from scratch
pnispero commented 3 days ago

Question: Need adbuild user account on all the facilties (where do we request?) This way I can test the full deployment with ssh.

pnispero commented 2 days ago

Tasks to do while blocked on remote deployment - TODO: Write these notes somewhere else

Another thing to tackle

look into how we implement