Open shankari opened 5 months ago
Shankari mentioned this: “At the end, we want to have one unified process for building all the images needed for OpenPATH. We are not doing incremental work here, we are doing a major redesign. I am open to merging PRs in each repo separately as an intermediate step but eventually, one merge and everything downstream is built”
Does this mean:
Even if other repos are ready to be merged, we can’t actually merge changes until say the parent repo for base images, which currently is e-mission-server is ready to be merged?
Will completely automating merges, skip the PR review process? Or would those PR merges still go through but nothing is actually triggered, until the merge in e-mission-server triggers it?
The admin and public repos are built on top of e-mission- server image - the Dockerfiles for these build off of the base image of e-mission server. What we would want to do is that when an emission-server PR is merged, we want to bump up the dependency in the admin and public dash board Docker files to the latest tag; and then rebuild those images. As long as there are no changes to Dockerfile, there should be no merge conflict; if it does exist, can take a look at it manually.
The automation would include just the changes to the Dockerfile concerning the latest image tags to be updated with the base server image. Then this would trigger and image build for the repo and we can potentially trigger image builds on every merge to a specific repo.
This does not include other code changes with PRs as these would still need to go through the code review process that we are currently following. The automated merges with Docker tag updates must occur only when the underlying e-mission-server has been updated. The automated builds for the latest merged or updated code versions of these repos (and not any open / un-merged PRs) can occur if needed on every merge.
Suggestions from Thursday:
Reusable workflows syntax example:
Job:
uses: /reponame/file/otheryaml.yml@main
Composite actions with repository dispatching (this uses webhooks)
Notes:
It is "sub-modules" not "some modules" 😄 And if you need internal repos, which of the three models should you follow?
So, I did take a look at git submodules and it might not be a good option for our usecase.
Found some information here:
What it does:
Why not good?
A possible redesign for the internal repositories I came up with includes having a single repository internally with sub-directories for each repo similar to how server repo is being used internally currently.
For server repo, internal repo is named as nrelopenpath. This contains two directories: webapp and analysis, referring to two AWS ECR images which are built from the same latest base server image after customizations.
Similarly, can refactor join, admin-dash, public-dash repos to be used like server repo is being used in the internal GitHub repos. This would avoid duplication of repos and including these steps:
Need to see how to fit in admin-dash (external remote upstream -> main -> dev) and public-dash (notebook image)
Pros and Cons: Pros:
Cons:
We have organized our findings, in a series of topics / sections for ease of read.
We spent a lot of time just scouring through the existing documentation in GitHub issues, PRs (both open and closed) spread throughout the repositories for our OpenPATH project. As we kept finding more and more issues, we thought it’d be a good idea to keep these organized as we had to keep referring to them back and forth and this table was pretty helpful. Hence, putting it in here so it serves as a good reference for others.
Notes:
While it’s been a lot of information to digest, we’ve gained a much better understanding now of the history behind OpenPATH deployment and have compiled an expansive list of issues that we believe relate to this. Perhaps not all of the below learnings may be relevant to the redesign task but were somewhat related and may have reasonings behind the way we have built our architecture currently.
Some knowledge points from our exploration are mentioned below:
1. Admin-dashboard A. Switch to Dev from Main [1]
B. Wrappers added in Dockerfile (for other repos too) [1]
2. Join A. Sebastian, found the need to create a docker container ; no longer a redirect to CMS [1]
3. Public-dash
A. Basic understanding of public-dash images [1]
”It essentially provisions two docker containers: one for generating the static images (that we basically run notebooks using a cronjob to generate) and a dead simple frontend using jquery that basically displays the image.”
B. AWS Codebuild [1]
C. Switched to pinned notebook Image [1, 2]
D. Point where we switched to using notebook image instead of server image [1]
These public-dash changes that commented out code (used in external and ran notebooks directly for generating plots) were all done around the same time - Sep 2022. This also coincides with the time when the public-dash base image was changed to using the docker hub notebook image instead of server image. So all of this seems to be a part of the hack used by Shankari and Jianli to make sure the latest deployment back then went through successfully.
4. e-mission-server
A. CI publish multiple images from multiple branches + Image_build_push.yml origin [1, 2]
Most important issue as this is very close to the current task we’re working on that highlights the redesign needed.
Looks like we also went from a monorepo / monolithic design to a split microservices design, especially for the server code:
B. Learnt about the origin of Dockerhub usage + Automatic image build
C. Travis CI was tested by Shankari [1]
D. AWS Costs Detailed Discussion [1]
E. Why or why not to use Dockerhub (costs, free tier, images removal) [1]
Dockerhub resources: [pricing, 6 months policy, policy delayed, retention policy + docker alternatives]
F. MongoDB to AWS DocumentDB switch [1]
5. Nrelopenpath [INTERNAL]
6. e-mission-docker
A. Origin of multi-tier Docker compose [1]
We have been collaborating with cloud services back and forth.
Q1:
We understand the current process to build and deploy images looks like this: a. Build from Dockerfile in external repos and pushed to Dockerhub b. Rebuild in Jenkins based on modified Dockerfiles in internal repos c. Push to AWS ECR
Could the possible process be changed to only build once externally like this: a. Build from Dockerfile in external repos and pushed to Dockerhub b. Runs Jenkins pipeline which pulls images directly from Dockerhub (and not rebuild) c. Push to AWS ECR
A1:
Security is the most important reason that NREL does not pull EXTERNAL images and deploys it direct to NREL infrastructures. That's why Cloud build central ECRs for all cloud project images stored on AWS, all images needs to be (regularly) scanned by Cyber.
Q2: Is one of the reasons we need to modify the images due to using confidential credentials to access AWS Cognito (ie nrel-cloud-computing/ nrelopenpath-admin-dashboard/docker-compose-prod.yml)?
A2:
Cloud team is not using docker-compose for deploy. The base image is pulled from EXTERNAL, i.e. your DockerHub, we wrapped the image to fit into AWS ECS deployment. For example, https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/blob/master/webapp/Dockerfile, we need to install certificates for your app to access DocumentDB cluster.
Q3: How and when is nrelopenpath/buildspec.yml run? Is it run multiple times in the pipeline for the staging and production deployments?
A3:
The CodeBuild (buildspec.yaml) run will be triggered through Jenkins. On Jenkins job, there's checkbox to let you choose if building image or not. If not, then it'll search the image built in most recent time. If yes, it'll run build the image before the deployment, it builds once and deploy to multiple productions.
Q4: How and when are the images for the other repos(ie public dash, admin dash, join) built? How are they run if they're not in buildspec.yml as we only see "webapp" and "analysis" in here?
A4:
Same as what's in 3, if you choose build app images, then all public, join, admin, web/analysis images would be re-built. Shankari would like the Jenkins job to be simple, that's why, we use on button/checkbox to control everything under the hood.
Q5: Regarding this commit, what was the reasoning behind running the .ipynb notebooks directly?
A5:
We run the .ipynb notebooks directly, it's because previous code was running this with crontab. But in AWS, we are not using crontab to schedule tasks, we are using ECS Scheduled Tasks, basically we created AWS EventBridge rules to schedule viz scripts run with a cron expression. As we are using the Docker container, the container could fail, if it failed, then the crontab would not run within container. That's why we choose use AWS to schedule and run cron job as container.
Q6: Assuming there are not a lot of such wrapping images tasks, it seems like these tasks like adding certificates can be moved to the EXTERNAL Dockerfile. If so, that would mean we no longer require the INTERNAL Dockerfile and would not need build the docker image INTERNALLY. We would still be pushing to AWS ECR?
A6:
Yes, Cloud has built pipeline required by Cyber, where it scans all images from NREL ECR regularly and reports vulnerabilities/KEVs. Which means the pipeline could not pull and scan external images, it requires all images built and pushed to NREL ECR as the central repos.
Q7: For more clarity, in the buildspec.yml file (https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/blob/master/buildspec.yml), we currently have four phases: install, pre-build, build, post-build. Would it be possible, to just have three phases: install, pre-build, post-build; i.e. skipping the build stage?
Instead, in the post-build stage, just before “docker push” to AWS ECR, we can pull the required image from Dockerhub, tag it, then push to AWS ECR. Thus we would not require to rebuild it, again assuming that we do not have any wrappers to be applied (these can be applied EXTERNALLY itself).
A7:
It sounds doable, you could update and give a try, as long as there's no NREL specific info expose to external.
Some key takeaways:
Two questions to answer and we’ve put forth our points having discussed with Jianli.
1. Do we need internal repos at all? Why? Yes. We do need them.
2. Do we need to re-build images both externally and internally (two sets of Dockerfiles for each repo)
A. Possibly not needed to rebuild:
B. Possibly need to rebuild:
To summarize:
Redesign steps suggested in previous meeting:
1. Add all four repos to the Multi-tier docker structure
A. Current setup
B. New setup
C. Feedback from previous meeting:
2. Proposed deployment process
We are still considering the one internal repository structure (mentioned above in 1.) with related files for each repo inside just one subdirectory per repo. So, all four repos would have a ready-to-use image pushed to Dockerhub.
A. Skipping the “build” job:
B. Streamlining repo-specific build processes
i. E-mission-server:
ii. Join
iii. Admin-dash:
iv. Public-dash:
Two possibilities:
One high-level comment before our next meeting:
Conf directories are different in internal and external repos as well as in webapp and analysis internally.
Do we need all these conf directories? A lot of them have a single configuration. See also https://github.com/e-mission/e-mission-server/pull/959#issuecomment-1975778952
Table for Differences in External and Internal Repositories
S.No. | Repository/Container | File | Difference | Findings | Solution | Needs Rebuilding Internally (Y/N) |
---|---|---|---|---|---|---|
1. | Join | N/A | None | N/A | Internal matches External. | No |
2. | Admin-dash | docker/start.sh | sed changed to jq | Tested changing sed to jq for both admin-dash and public-dash. Changed in script, rebuilt containers, working with jq. |
Change sed to jq in external repos. | No |
docker/Dockerfile | AWS Certificates | Cannot move outside since this customization is just for our use case as we use AWS DynamoDB. What if someone else is using other DBs like MongoDB or other cloud services. Natalie has an idea to use environment variables and a script that runs as a Dockerfile layer. |
Keep it as it is / Natalie script. | Yes | ||
docker-compose-prod.yml | Cognito credentials added. | For internal Github repo, these will be stored as secrets. These will be set up as ENV variables in the docker-compose same as current setup but will use secrets instead of directly setting values. |
Use GitHub secrets + Environment variables. | Yes | ||
config.py, config-fake.py | INDEX_STRING_NO_META added. | We searched for history behind this addition and found that it was done as a workaround to handle a security issue with a script injection bug. We found that Shankari had filed an issue with the dash library repository and was able to elicit a response and a fix from the official maintainers of the library. With the dash library version that fixed it 2.10.0, flask version <= 2.2.3 is needed. Hence choosing next higher versions (2.14.1 which increased flask version limit), 2.14.2 (mentioned by a developer that it works). Working with 2.16.1 latest version as well. The issue no longer appears when tested with versions: 2.14.1, 2.14.2, 2.16.1. Shankari updates https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/23930b6687c6e2e8cd4aeb79d3181fc7af065de6 https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/c40f0866c76e2bfa03bef05d0daefda30625a943 https://github.com/plotly/dash/issues/2536 https://github.com/plotly/dash/pull/2540 Related: https://github.com/plotly/dash/issues/2699 https://github.com/plotly/dash/issues/2707 [2.14.2 works] Release Tags https://github.com/plotly/dash/releases/tag/v2.10.0 [Contains fix for Shankari’s issue] https://github.com/plotly/dash/releases |
Upgrade Dash library version to >= 2.14.1 or to latest (2.16.1) in requirements.txt | No | ||
app_sidebar_collapsible.py | OpenPATH logo removed. Config file import added. INDEX_STRING_NO_META added. |
Except OpenPATH logo, others can be skipped. Need more info on whether OpenPATH logo can be added or kept as it is from external version or not. https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/47f023dc1d6b5a6531693a90e0575830691356ec https://github.com/e-mission/op-admin-dashboard/issues/43 Config import and INDEX_STRING_NO_META can be removed as I’ve tested that updating the dash version to version 2.16.1 and above solves the issue and we no longer need a workaround. |
Decide on whether OpenPATH logo can be added to internal. Based on Shankari’s suggestions in the open issue, we believe it’d be a good idea to store the icon as a static image (png or svg) in the local file system. Config import and INDEX_STRING_NO_META can be removed after Dash version upgrade. |
No | ||
3. | Public-dash | start_noteboook.sh | sed changed to jq | Tested changing sed to jq for both admin-dash and public-dash. Changed in script, rebuilt containers, working with jq. |
Change sed to jq in external repos. | No |
docker/Dockerfile | AWS Certificates | Cannot move outside since this customization is just for our use case as we use AWS DynamoDB. What if someone else is using other DBs like MongoDB or other cloud services. Natalie has an idea to use environment variables and a script that runs as a Dockerfile layer. |
Keep it as it is / Natalie script. | Yes | ||
start_noteboook.sh | Python notebooks split into multiple execution calls. | Related to AWS ECS scheduled tasks being used instead of cron jobs. | Natalie’s suggestion: event-driven updates | Yes | ||
4. | Nrelopenpath ENV variables |
analysis/conf/net/ext_service/push.json | Four key-value pairs containing credentials, auth tokens. | Need to test if we can we pass the entire dictionary as an environment variable? If yes, avoids having to create 4 different ENV variables and can just use one for the entire file. |
Use environment variables. | No |
webapp/conf/net/auth/secret_list.json | One key-value pair containing credentials, auth tokens. | Need to test if we can we pass the entire list as an environment variable? File history https://github.com/e-mission/e-mission-server/pull/802 Hardcoded now, switch to some channel later https://github.com/e-mission/e-mission-docs/issues/628#issuecomment-799828333 |
Use environment variables. | No | ||
5. | Nrelopenpath/analysis CONF Files |
conf/log/intake.conf | Logging level changed. | Debug level set in internal while Warning level set in external. Debug is lower priority than Warning. This means internally, we want to log as much as possible. Mentioned here that this was a hack, not a good long-term solution. https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/6dfc8da32cf2d3b98b200f98c1777462a9194042 Same as log/webserver.conf in webapp. Just differs in two filenames which are log file locations. |
Keep it as it is. Decide on level of logging and details. |
Yes |
conf/analysis/debug.conf.json | Three key-value pairs. | Analysis code config values, keys. | Keep it as it is. | Yes | ||
6. | Nrelopenpath/webapp CONF Files |
conf/log/webserver.conf | Logging level changed. | Debug level set in internal while Warning level set in external. Debug is lower priority than Warning. This means internally, we want to log as much as possible. Mentioned here that this was a hack, not a good long-term solution. https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/6dfc8da32cf2d3b98b200f98c1777462a9194042 Same as log/intake.conf in analysis. Just differs in two filenames which are log file locations. |
Keep it as it is. Decide on level of logging and details. |
Yes |
conf/analysis/debug.conf.json | Nine key-value pairs. | Analysis code config values, keys. | Keep it as it is. | Yes | ||
conf/net/api/webserver.conf.sample | 2 JSON key-value pairs. 1 pair removed. |
Related to auth (2 pairs). Looks like 404 redirect was added only in external and not in internal. Found this commit for addition of 404 redirect in external: https://github.com/e-mission/e-mission-server/commit/964ed288032262e1bedc945b24c04b192114aec5 Why sample file used + skip to secret https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/a66a52a8fbd28bb13c4ebad445ba21e8b478c105 Secret to skip https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/cb4f54e4c201bd132c311d04a5e34a57bae2efb7 Skip to dynamic https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/52392a303ea60b73e8ea23a199b09778b306d8da |
Keep it as it is. Should the redirect be added to internal as well? | Yes | ||
7. | Nrelopenpath/analysis Startup Scripts |
cmd_intake.sh cmd_push_remind.sh cmd_build_trip_model.sh cmd_push.sh cmd_reset_pipeline.sh |
One external startup script duplicated into 5 scripts. | Crontab and start_cron.sh no longer used as they are not used in NREL-hosted production environments https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-public-dashboard/commit/2a8d8b53d7ecf24e85217fb9b812ef3936054e35. Instead, scripts are run directly in Dockerfile as ECS can create scheduled task for cron. |
TBD Have a single script that runs these scripts based on parameters. But how will ECS know when to run each script? Can we execute these in Dockerfile like public-dash currently runs Python notebooks (bad hack!?) |
Yes |
8. | Nrelopenpath/webapp Startup Scripts |
start_script.sh | Additional command to copy conf files added. | Need to understand what is being copied where. Tested and saw that custom conf files added or copied over to conf directory of base server image. |
Keep it as it is. | Yes |
High-level thoughts:
Microscopic details:
config.py
is even needed. IIRC, it just reads environment variables and makes them available to python. Do we need a class that does that, or can we read environment variables directly. Or write a simpler class that just makes all environment variables available as python variables.analysis
configurationjq
and we don't need to convert webserver.conf.sample to webserver.conf, and more importantly, db.conf.sample to db.confCurrent dealings:
@MukuFlash03 and I have been collaborating on minimizing the differences between the internal and external repositories.
I have been working on figuring out how to pass the image tag – created in the server action image-build-push.yml – between repositories. I’ve been attempting to use the upload-artifact/download-artifact method. It worked to upload the file, and we were able to retrieve the file in another repository, but we had to specify the run id
for the workflow where the artifact was created. So, this defeats the purpose of automating the image tag in the first place.
We also looked into GitHub release and return dispatch as options, but decided they were not viable.
There are ways to push files from one run to another repository, though we haven’t tried them yet. Write permissions might be a barrier to this, so creating tokens will be necessary. If we can get the file pushing to work, this is our intended workflow:
Created PR for implementation code changes here:
A. External PRs:
B. Internal PR
Had a meeting with @MukuFlash03 to discuss some issues with testing. Made a plan for documentation of testing and outlined what all needs to be done.
Posting a list of the docker commands I used to verify whether the docker images were building successfully. Next, I also tested whether containers could be run from the images.
I had to ensure that the configurations setup in the docker-compose files were set manually by me since in the internal images docker-compose is not needed used any more. These settings included things like ports, networks, volumes, environment variables.
Creating a network so containers can be connected to each other:
$ docker network create emission
DB container is needed for storage; data must be loaded into it (I did not load data when I did this testing initially)
$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
A. Internal Repo Images
Checkout to multi-tier branch in internal repo -> nrelopenpath
$ docker build -t int-webapp ./webapp/
$ docker run --name int-webapp-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage_program --network emission int-webapp
$ docker build -t int-analysis ./analysis/
$ docker run --name int-analysis-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage_program -e PUSH_PROVIDER="firebase" -e PUSH_SERVER_AUTH_TOKEN="Get from firebase console" -e PUSH_APP_PACKAGE_NAME="full package name from config.xml. e.g. edu.berkeley.eecs.emission or edu.berkeley.eecs.embase. Defaults to edu.berkeley.eecs.embase" -e PUSH_IOS_TOKEN_FORMAT="fcm" --network emission int-analysis
$ docker build -t int-join ./join_page
$ docker run --name int-join-1 -d -p 3274:5050 --network emission int-join
Sometimes, during local testing, join and public-dash frontend page might load the same html file as the port is still mapped to either join or public-dash depending on which is run first. So, change the join port to a different one (just for testing purposes). :
$ docker run --name int-join-1 -d -p 2254:5050 --network emission int-join
$ docker build -t int-admin-dash ./admin_dashboard
$ docker run --name int-admin-dash-1 -d -e DASH_SERVER_PORT=8050 -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e CONFIG_PATH="https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/" -e STUDY_CONFIG="stage-program" -e DASH_SILENCE_ROUTES_LOGGING=False -e SERVER_BRANCH=master -e REACT_VERSION="18.2.0" -e AUTH_TYPE="basic" -p 8050:8050 --network emission int-admin-dash
$ docker build -t int-public-dash-frontend ./public-dashboard/frontend/
$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission int-public-dash-frontend
$ docker build -t int-public-dash-notebook ./public-dashboard/viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission int-public-dash-notebook
B. External Repo Images
Directly pull latest pushed image from Dockerhub for each repo in its Dockerfile
docker run --name container_name image_name
Alternatively, can try building from external repo after switching to consolidate-differences branch for e-mission-server and image-push branch for join, admin-dash, public-dash. Will have to use docker build commands similar to internal images above.
docker build -t image_name Dockerfile_path
docker run —name container_name image_name
Or, can run docker-compose commands since external images still have docker compose files.
Join and Public-dash: $ docker-compose -f docker-compose.dev.yml up -d
Admin-dash: $ docker compose -f docker-compose-dev.yml up -d
Initially, I used option 1.
E-mission-server
$ docker run --name em-server-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program --network emission mukuflash03/e-mission-server:image-push-merge_2024-04-16--49-36
Join
$ docker run --name join-2 -d -p 3274:5050 --network emission mukuflash03/nrel-openpath-join-page:image-push-merge_2024-03-26--22-47
Op-admin
$ docker run --name op-admin-2 -d -e DASH_SERVER_PORT=8050 -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e CONFIG_PATH="https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/" -e STUDY_CONFIG="stage-program" -e DASH_SILENCE_ROUTES_LOGGING=False -e SERVER_BRANCH=master -e REACT_VERSION="18.2.0" -e AUTH_TYPE="basic" -p 8050:8050 --network emission mukuflash03/op-admin-dashboard:image-push-merge_2024-04-16--00-11
Public-dash Frontend / dashboard
$ docker run --name public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission mukuflash03/em-public-dashboard:image-push-merge_2024-04-16--59-18
Public-dash Viz_scripts / notebook-server
$ docker run --name public-dash-viz-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission mukuflash03/em-public-dashboard_notebook:image-push-merge_2024-04-16--59-18
For automatic updates of the tags, we have three options:
run_id
between repositories?For automatic updates of the tags, we have three options:
- pushing a file from a github action in one repo to a github action in another repo (untried)
@MukuFlash03 See my comment above, quoted below, for an outline of the steps to try to get the file pushing method to work:
There are ways to push files from one run to another repository, though we haven’t tried them yet. Write permissions might be a barrier to this, so creating tokens will be necessary. If we can get the file pushing to work, this is our intended workflow:
- E-mission-server image-build-push.yml: writes image tag to file
- Tag file is pushed to admin dash, and public dash
- Push of file triggers image-build-push workflows in the other repos
- File is read into image-build-push workflow
- Tag from file set as an environment variable for workflow run
- Dockerfiles updated with tags
- Docker image build and push
There is also https://stackoverflow.com/questions/70018912/how-to-send-data-payload-using-http-request-to-github-actions-workflow which is a GitHub API-fueled approach to passing data between repositories
Requirements:
docker image
workflow run in e-mission-server should trigger workflows to build and push docker images in admin-dash and public-dash repositories. Notes:
Current status:
Implemented
For reference, matrix strategy for workflow dispatch events
Pending
Approaches planned to try out:
Approaches actually tested, implemented and verified to run successfully
Reason for not trying out Approach 2:
I tried out and implemented Approach 1 and 3 first. Approach 3 was necessary for triggering a workflow based on another workflow in another repository. Approach 1 sort of included Approach 2 of pushing files in the form of artifacts. Both Approach 1 and 2 would need Approach 3 to trigger workflows in the dashboard repos at the same time. With these two approaches implemented, I was done with Requirements 1) and 2).
The next major task was to work on Req 3) which involves updating the Dockerfiles in the dashboard repos. Approach 2 is somewhat related to this as physical files present in the actual repos will need to be modified, committed. This is in contrast to any files, text data passed around in Approaches 1 and 3, which was all being done inside the GitHub actions workflow runner. The artifact files were available outside the runner after its execution but they were still tied to the workflow run. With Approach 2, and in completing Req. 3, I would need to handle the Dockerfiles outside the workflow runs, hence I skipped Approach 2 as in a way I'd be working on it anyways.
Details of approaches
In my forked repositories for e-mission-server, join, admin-dash, public-dash there are three branches available for the tags automation: tags-artifact, tags-dispatch, tags-matrix.
tags-artifact branch in: e-mission-server, admin-dash, public-dash, join
tags-dispatch branch in: e-mission-server, admin-dash, public-dash, join
tags-matrix branch in: e-mission-server, admin-dash, public-dash, join
Approach 1: tags-artifact: Approach 3: tags-dispatch, tags-matrix
repo access permissions
, the workflow run id, the source repository name with user or organization name.Cons:
workflow dispatch
events which sends POST requests to a target repository to trigger workflows in those repositories.actions: write
.
---------
3. tags-matrix [Approach 3: GitHub REST APIs]
- Official documentation: [matrix strategy](https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs)
- Similar to tags-dispatch, wherein workflow dispatch events are used.
- The only difference is that the matrix strategy is to dispatch parallel events to the target repositories once the source repository (e-mission-server), successfully completes execution.
- This also shows the dispatch events as a 2nd set of combined jobs in the workflow run graph.
- - This required usage of a fine-grained token with the required access scope was `actions: write`.
strategy:
matrix:
repo: ['MukuFlash03/nrel-openpath-join-page', 'MukuFlash03/op-admin-dashboard', 'MukuFlash03/em-public-dashboard']
- name: Trigger workflow in join-page, admin-dash, public-dash
run: |
curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.GH_FG_PAT_TAGS }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/${{ matrix.repo }}/actions/workflows/image_build_push.yml/dispatches \
-d '{"ref":"tags-matrix", "inputs": {"docker_image_tag" : "${{ env.DOCKER_IMAGE_TAG }}"}}'
-------
Pros of workflow dispatch events and matrix strategy :
- An advantage of using the workflow dispatch events is that we do not need metadata like the run ID.
- There is an option to use workflow ID but is can also be replaced by the workflow file name; hence even workflow ID isn't needed.
- I did calculate the workflow ID for each workflow file "image-build-push.yml" in the target repositories by using these API endpoints: [e-mission-server](https://api.github.com/repos/MukuFlash03/e-mission-server/actions/workflows), [join workflows](https://api.github.com/repos/MukuFlash03/nrel-openpath-join-page/actions/workflows), [admin-dash workflows](https://api.github.com/repos/MukuFlash03/op-admin-dashboard/actions/workflows), [public-dash](https://api.github.com/repos/MukuFlash03/em-public-dashboard/actions/workflows)
- I have kept the same workflow file name for the target repositories and hence in the tags-matrix repo, I can simply use the same workflow name but with differing repository names as defined in the matrix to run the curl command for the all the repositories.
The next major task was to work on Req 3) which involves updating the Dockerfiles in the dashboard repos.
I don't think that the solution should be to update the Dockerfiles. That is overkill and is going to lead to merge conflicts. Instead, you should have the Dockerfile
use an environment variable, and then set the environment variable directly in the workflow or using a .env
file
Instead, you should have the
Dockerfile
use an environment variable, and then set the environment variable directly in the workflow or using a.env
file
Our primary concern with this method was for users building locally. Is it acceptable to tell users to copy the latest image from the docker hub image library in the README?
docker-compose
can use local images
The comment around .env
was for @MukuFlash03's task to update the tag on the Dockerfile, which is not related to testing, only for the GitHub triggered actions.
Finally got the tags automation to work completely in one click starting from the e-mission-server workflow, passing the latest timestamp used as the docker image tag suffix and then triggering the workflows in admin-dashboard and public- dashboard.
Final approach taken for this involves a combination of the artifact and the matrix-dispatch methods discussed here.
Additionally, as suggested by Shankari here, I changed the Dockerfiles to use environment variables set in the workflow runs itself. Hence, not using / updating hardcoded timestamp values in the Dockerfiles anymore.
I don't think that the solution should be to update the Dockerfiles. That is overkill and is going to lead to merge conflicts. Instead, you should have the
Dockerfile
use an environment variable, and then set the environment variable directly in the workflow or using a.env
file.
There is still a manual element remaining, however this is to do with any users or developers looking to work on the code base locally with the dashboard repositories.
The docker image tag (only the timestamp part) will need to be manually copied from the latest Dockerhub image of the server repo and added to the args
in the docker-compose files.
This is also what @nataliejschultz had mentioned here:
Our primary concern with this method was for users building locally. Is it acceptable to tell users to copy the latest image from the docker hub image library in the README?
# Before adding tag
build:
args:
DOCKER_IMAGE_TAG: ''
# After adding tag
build:
args:
DOCKER_IMAGE_TAG: '2024-05-02--16-40'
Implementation Approaches discussed here.
Combined approach (artifact + matrix) tags-combo-approach branch: e-mission-server, admin-dash, public-dash
Successful workflow runs:
I decided to go ahead with the matrix-build strategy which dispatches workflows to multiple repositories when triggered from one source repositories. I had implemented this in tags-matrix branches of the dashboard repos (join repo as well, but this was just for initial testing purposes; final changes only on the dashboard repos).
Initially, I only had a push event trigger, similar to the docker image build and push workflow in the server repo.
However, I realized that there would now be two types of Github actions events that should trigger the workflows in the admin-dashboard and public-dashboard repos.
The second type of trigger would be a workflow_dispatch
event.
This was implemented and working via the matrix-build workflow dispatch branch.
Now, for the workflow dispatch event, I was able to pass the latest generated docker image timestamp directly via the e-mission-server workflow in the form of an input parameter docker-image-tag
.
- name: Trigger workflow in admin-dash, public-dash
run: |
curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.GH_FG_PAT_TAGS }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/${{ matrix.repo }}/actions/workflows/image_build_push.yml/dispatches \
-d '{"ref":"tags-combo-approach", "inputs": {"docker_image_tag" : "${{ env.DOCKER_IMAGE_TAG }}"}}'
This parameter was then accessible in the workflows of the dashboard repos:
on:
push:
branches: [ tags-combo-approach ]
workflow_dispatch:
inputs:
docker_image_tag:
description: "Latest Docker image tags passed from e-mission-server repository on image build and push"
required: true
With these changes done, I believed I was done but then I came across some more issues. I have resolved them all now but just mentioning them.
Why I chose to add artifact method as well?
The issue I was facing was with fetching the latest timestamp for the image tag in case of a push event trigger. This is because in the workflow dispatch, the server workflow itself would trigger the workflows and hence was in a way connected to these workflows. However, push events would only trigger the specific workflow in that specific dashboard repository to build and push the image and hence would not be able to retrieve the image tag directly.
So, I utilized the artifact upload and download method to:
tags-combo-approach
but to be changed to master
once changes are final).needs
keyword to create chained jobs that are dependent on each other and will always wait for the previous task to complete before executing.output variables
and Environment variables
were used in the workflow to pass values from one job to the other.Dockerfiles' FROM layer looks like:
ARG DOCKER_IMAGE_TAG
FROM mukuflash03/e-mission-server:tags-combo-approach_${DOCKER_IMAGE_TAG}
Solution I implemented involves defining two DOCKER_IMAGE_TAGS
in the workflow file, one for push, the other for workflow_dispatch:
env:
DOCKER_IMAGE_TAG_1: ${{ needs.fetch_tag.outputs.docker_image_tag }}
DOCKER_IMAGE_TAG_2: ${{ github.event.inputs.docker_image_tag }}
I then passed either of these as the --build-arg for the docker build
command depending on the event trigger:
- name: build docker image
run: |
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
docker build --build-arg DOCKER_IMAGE_TAG=$DOCKER_IMAGE_TAG_2 -t $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} .
else
docker build --build-arg DOCKER_IMAGE_TAG=$DOCKER_IMAGE_TAG_1 -t $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} .
fi
The ReadMe.md can contain information on how to fetch this tag, similar to how we ask users to manually set their study-config, DB host, server host info for instance.
wrt merging, I am fine with either approach
No manual intervention required; not even by developers using code
Referring to this comment:
There is still a manual element remaining, however this is to do with any users or developers looking to work on the code base locally with the dashboard repositories. The docker image tag (only the timestamp part) will need to be manually copied from the latest Dockerhub image of the server repo and added to the
args
in the docker-compose files.
I've gone ahead and implemented the automated build workflow with the addition of the .env file in the dashboard repos which just stores the latest timestamp from the last successfully completed server image.
Thus, the build is completely automated now and users / developers who want to run the code locally will not have to manually feed in the timestamp from the docker hub images.
The .env file will be updated and committed in the github actions workflow automatically and changes will be pushed to the dashboard repo by the github actions bot.
Links to successful runs
A. Triggered by Workflow_dispatch from e-mission-server Server run, Admin-dash run, Public-dash run
Automated commits to update .env file: Admin-dash .env, Public-dash .env
B. Triggered by push to remote dashboard repositories Admin-dash run, Public-dash run
Automated commits to update .env file: Admin-dash .env, Public-dash .env
I also tested another scenario where let's say a developer changed the timestamp in the .env file to test an older server image. Now, they may have accidentally pushed this older timestamp to their own repo. What happens when they create a PR with their changes which includes this older server image?
Thus expected workflow steps in this case would be:
Some outputs from my testing of this scenario, where I manually entered an older timestamp (2024-05-02--16-40) but the workflow automatically updated to latest timestamp (2024-05-03--14-37).
A. Public-dash
mmahadik-35383s:em-public-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-02--16-40
mmahadik-35383s:em-public-dashboard mmahadik$ git pull origin tags-combo-approach
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 2 (delta 1), reused 2 (delta 1), pack-reused 0
Unpacking objects: 100% (2/2), 262 bytes | 65.00 KiB/s, done.
From https://github.com/MukuFlash03/em-public-dashboard
* branch tags-combo-approach -> FETCH_HEAD
9444e60..40beb80 tags-combo-approach -> origin/tags-combo-approach
Updating 9444e60..40beb80
Fast-forward
.env | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
mmahadik-35383s:em-public-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-03--14-37
B. Admin-dash
mmahadik-35383s:op-admin-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-02--16-40
mmahadik-35383s:op-admin-dashboard mmahadik$ git pull origin tags-combo-approach
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), 308 bytes | 77.00 KiB/s, done.
From https://github.com/MukuFlash03/op-admin-dashboard
* branch tags-combo-approach -> FETCH_HEAD
d98f75c..f1ea34c tags-combo-approach -> origin/tags-combo-approach
Updating d98f75c..f1ea34c
Fast-forward
.env | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
mmahadik-35383s:op-admin-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-03--14-37
Also, added TODO s to change from my repository and branches to master branch and e-mission server repo.
@shankari on the internal repo:
Right now, for the server images that this PR is built on:
- wait for the server build to complete
- copy the server image tag using command+C
- edit
webapp/Dockerfile
andanalysis/Dockerfile
and change the tag using command-V- commit the changes
- push to the main branch of the internal repo
- launch build using Jenkins
Ideally the process would be:
something something updates, commits and pushes the updated tags to the main branch of internal repo
- it is fine for this to be a manual action, at least initially, but I want one manual action (~one button or~ one script)
- creating a PR that I merge would be OK but sub-optimal. Short-term ideally, this would just push directly to the repo so no merge is required.
- could this run in Jenkins? No visibility into Jenkins. We should write a script as a template for cloud services if this is even possible.
- I manually launch build using Jenkins
Initial thoughts about a script:
Pull the image tags from the external repos (GitHub API?) Write those image tags into the Dockerfiles for each repository *Create a PR that's auto-merged, so the tags are ready to go for the Jenkins pipeline
Where to run?
I have created all the tokens needed; we just need to clean this up to a basic level and merge. Once we are done with the basic level, I think we will still need one more round of polishing for this task, but we can track that in a separate issue
I can see that we have used docker build
and docker run
directly both in the PRs and while testing them (e.g.)
https://github.com/e-mission/e-mission-docs/issues/1048#issuecomment-2075204752
or
https://github.com/e-mission/em-public-dashboard/pull/125/files#diff-bde90ebb933051b12f18fdcfcefe9ed31e2e3950d416ac84aec628f1f9cc2780R136
This is bad. We use docker-compose
extensively in the READMEs, and we should be standardizing on it.
Using docker build
or docker run
makes it more likely that we will make mistakes in the way that the containers are configured and interact with each other.
I have already commented on this before: https://github.com/e-mission/em-public-dashboard/pull/125#issuecomment-2081561409
I will not review or approve any further changes that use docker build
or docker run
unless there is a reason that docker-compose
will not work
I got docker compose to work in actions for our process, but had to do it in a roundabout way. The issue is that we want to push an image with the correct tag, and docker build allows you to specify the name of the tag using the -t flag. Docker compose does not work this way; you have to name the image in the compose file directly like this:
services:
dashboard:
build:
image: name_of_image
Originally, I had planned to use an environment variable in my compose call
SERVER_IMAGE_TAG=$DOCKER_IMAGE_TAG_2 ADMIN_DASH_IMAGE_TAG=$DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} docker compose -f docker-compose-dev.yml build
and then set the name of the image to ${ADMIN_DASH_IMAGE_TAG}. However, this does not seem ideal for people running locally. I found a way around this by adding a renaming step in the build process:
- name: rename docker image
run: |
docker image tag e-mission/opdash:0.0.1 $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}
This way we can keep the names of the images the same and push them correctly. I tested the environment variable version here, and the renaming version here. Both worked!
OpenPATH currently has four main server-side components:
the webapp and analysis containers are launched from e-mission-server; the others are in separate repos that build on e-mission-server.
There are also additional analysis-only repos (e-mission-eval-private-data and mobility-scripts) that build on e-mission-server but are never deployed directly to production.
In addition, there are internal versions of all the deployable containers that essentially configure them to meet the NREL hosting needs.
We want to unify our build and deploy processes such that: