e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 32 forks source link

Unify buid and deploy processes across the various components of OpenPATH #1048

Open shankari opened 5 months ago

shankari commented 5 months ago

OpenPATH currently has four main server-side components:

the webapp and analysis containers are launched from e-mission-server; the others are in separate repos that build on e-mission-server.

There are also additional analysis-only repos (e-mission-eval-private-data and mobility-scripts) that build on e-mission-server but are never deployed directly to production.

In addition, there are internal versions of all the deployable containers that essentially configure them to meet the NREL hosting needs.

We want to unify our build and deploy processes such that:

MukuFlash03 commented 5 months ago

Shankari mentioned this: “At the end, we want to have one unified process for building all the images needed for OpenPATH. We are not doing incremental work here, we are doing a major redesign. I am open to merging PRs in each repo separately as an intermediate step but eventually, one merge and everything downstream is built”

Does this mean:

  1. Even if other repos are ready to be merged, we can’t actually merge changes until say the parent repo for base images, which currently is e-mission-server is ready to be merged?

  2. Will completely automating merges, skip the PR review process? Or would those PR merges still go through but nothing is actually triggered, until the merge in e-mission-server triggers it?

MukuFlash03 commented 5 months ago

The admin and public repos are built on top of e-mission- server image - the Dockerfiles for these build off of the base image of e-mission server. What we would want to do is that when an emission-server PR is merged, we want to bump up the dependency in the admin and public dash board Docker files to the latest tag; and then rebuild those images. As long as there are no changes to Dockerfile, there should be no merge conflict; if it does exist, can take a look at it manually.

The automation would include just the changes to the Dockerfile concerning the latest image tags to be updated with the base server image. Then this would trigger and image build for the repo and we can potentially trigger image builds on every merge to a specific repo.

This does not include other code changes with PRs as these would still need to go through the code review process that we are currently following. The automated merges with Docker tag updates must occur only when the underlying e-mission-server has been updated. The automated builds for the latest merged or updated code versions of these repos (and not any open / un-merged PRs) can occur if needed on every merge.

nataliejschultz commented 5 months ago

Suggestions from Thursday:

  1. Use GitHub Actions with job dispatch and/or reusable workflows to work within multiple directories.

Reusable workflows syntax example:

Job:
    uses: /reponame/file/otheryaml.yml@main 

Composite actions with repository dispatching (this uses webhooks)

  1. GitHub Actions in external repos, Jenkins pipeline extension in internal repos

Notes:

shankari commented 5 months ago

It is "sub-modules" not "some modules" 😄 And if you need internal repos, which of the three models should you follow?

MukuFlash03 commented 4 months ago

So, I did take a look at git submodules and it might not be a good option for our usecase.

Found some information here:

  1. [Git official submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules
  2. GitHub Gist Post
  3. Blog Post

What it does:

Why not good?

MukuFlash03 commented 4 months ago

Single Internal Repository

A possible redesign for the internal repositories I came up with includes having a single repository internally with sub-directories for each repo similar to how server repo is being used internally currently.

For server repo, internal repo is named as nrelopenpath. This contains two directories: webapp and analysis, referring to two AWS ECR images which are built from the same latest base server image after customizations.

Similarly, can refactor join, admin-dash, public-dash repos to be used like server repo is being used in the internal GitHub repos. This would avoid duplication of repos and including these steps:


Need to see how to fit in admin-dash (external remote upstream -> main -> dev) and public-dash (notebook image)


Pros and Cons:   Pros:

  1. No repo / codebase duplication.
  2. No need to worry about public -> private merging, since we're having images built from public repos itself.



Cons:


  1. All repos images to be built pushed to Dockerhub (currently only server, public-dash)

  2. Docker image tags to be updated twice for other repos: 
once in external (base image as server)
then in internal (base image as the pushed repo image; for e.g. join will have e-mission-join_timestamp)
MukuFlash03 commented 4 months ago

Index for Topic wise updates

  1. Compiled list of Related issues -> Serves as a good reference for others

  2. Learnings from Documentation across OpenPATH project.
  3. Questions and Answers exchanged with the Cloud Team + Key takeaways
  4. Understanding of the internal repos (Need + Rebuild)
  5. Redesign plan updates

We have organized our findings, in a series of topics / sections for ease of read.

MukuFlash03 commented 4 months ago

Topic 1: Compiled list of Related issues

We spent a lot of time just scouring through the existing documentation in GitHub issues, PRs (both open and closed) spread throughout the repositories for our OpenPATH project. As we kept finding more and more issues, we thought it’d be a good idea to keep these organized as we had to keep referring to them back and forth and this table was pretty helpful. Hence, putting it in here so it serves as a good reference for others.

Notes:

  1. Categorization into Super-related, Related and Possibly Related (or at times unrelated) done with respect to the current task of redesigning the build and deployment process.
  2. The major focus is on the four external and four internal repos related to the server and dashboard
  3. Other repositories referenced include: e-mission-docs, e-mission-docker
  4. Some labels / descriptions might not be appropriately categorized but related in a way to the repository and the task at hand and provide some important information.

S. No. Repository Super-related Related Possibly Related (or not)
1. e-mission-docs e-mission-docker cleanup
https://github.com/e-mission/e-mission-docs/issues/791

! CI to push multiple images from multiple branches
https://github.com/e-mission/e-mission-docs/issues/752

Public-dash build
https://github.com/e-mission/e-mission-docs/issues/809

Public-dash cleanup
https://github.com/e-mission/e-mission-docs/issues/803

Jenkins, ARGS / ENVS
https://github.com/e-mission/e-mission-docs/issues/822

Docker containerization
https://github.com/e-mission/e-mission-docs/issues/504

Docker testing
https://github.com/e-mission/e-mission-server/pull/731

!
Docker images, server error
https://github.com/e-mission/e-mission-docs/issues/543

NREL-hosted OpenPATH instance
https://github.com/e-mission/e-mission-docs/issues/721
Docker images + Jupyter notebook 
https://github.com/e-mission/e-mission-docs/issues/56

Dockerized setup procedure 
https://github.com/e-mission/e-mission-docs/issues/26

Nominatim
https://github.com/e-mission/e-mission-docs/issues/937

Automated Docker image build - mamba
https://github.com/e-mission/e-mission-docs/issues/926

Shutdown AWS systems
https://github.com/e-mission/e-mission-docs/issues/474

OTP Docker container
https://github.com/e-mission/e-mission-docs/issues/534

GIS Docker container
https://github.com/e-mission/e-mission-docs/issues/544


Submodules
- Splitting monolithic server into micro services
https://github.com/e-mission/e-mission-docs/issues/506
- Split UI from server
https://github.com/e-mission/e-mission-docs/issues/59

public-dash
https://github.com/e-mission/e-mission-docs/issues/743  

admin-dash
https://github.com/e-mission/e-mission-docs/issues/802

AWS Cognito
https://github.com/e-mission/e-mission-docs/issues/1008

Data Encryption
https://github.com/e-mission/e-mission-docs/issues/384

Heroku Deployment
https://github.com/e-mission/e-mission-docs/issues/714

Docker activation script
https://github.com/e-mission/e-mission-docs/issues/619

Monorepo. Viz scripts 
https://github.com/e-mission/e-mission-docs/issues/605#issuecomment-768061308

Tokens, Enterprise GitHub, DocumentDB, Containers
https://github.com/e-mission/e-mission-docs/issues/790#issuecomment-1247227231

AWS server, notebook
https://github.com/e-mission/e-mission-docs/issues/264

Deploy OpenPATH app to staging
https://github.com/e-mission/e-mission-docs/issues/732

Token stored on server - containers, enterprise repo, document DB
https://github.com/e-mission/e-mission-docs/issues/790
Tripaware Docker
https://github.com/e-mission/e-mission-docs/issues/469
https://github.com/e-mission/e-mission-docs/issues/624

Migrating trip segment feature to native Android
https://github.com/e-mission/e-mission-docs/issues/410

Sandbox environment
https://github.com/e-mission/e-mission-docs/issues/326

Docker compose UI
https://github.com/e-mission/e-mission-docs/issues/657

Separate participants and testers
https://github.com/e-mission/e-mission-docs/issues/642

Server dashboard crash
https://github.com/e-mission/e-mission-docs/issues/645

Code coverage
https://github.com/e-mission/e-mission-docs/issues/729

DynamoDB + MongoDB incompatibility; Jianli 
https://github.com/e-mission/e-mission-docs/issues/597

Conda error
https://github.com/e-mission/e-mission-docs/issues/511
2. e-mission-server Image Build Push yml origin perhaps?
https://github.com/e-mission/e-mission-server/pull/875

Server split - AWS Cost analysis
https://github.com/e-mission/e-mission-docs/issues/292

Travis CI
https://github.com/e-mission/e-mission-server/pull/728
Image re-build
https://github.com/e-mission/e-mission-server/pull/617

Docker files, image upload
https://github.com/e-mission/e-mission-server/pull/594#issuecomment-415375127

Remove webapp from server
https://github.com/e-mission/e-mission-server/pull/854

Skip image build for a branch
https://github.com/e-mission/e-mission-server/pull/906
3. em-public-dash Hanging line, Docker cleanup
https://github.com/e-mission/em-public-dashboard/issues/37

NREL-hosted version of public-dash
https://github.com/e-mission/e-mission-docs/issues/743

Production container code copied
https://github.com/e-mission/em-public-dashboard/pull/54

Dockerfile ARG vs ENV
https://github.com/e-mission/em-public-dashboard/pull/63

Notebook stored in .docker
https://github.com/e-mission/em-public-dashboard/pull/75

Build dashboard image from server image
https://github.com/e-mission/em-public-dashboard/pull/84

Pinned notebook image
https://github.com/e-mission/em-public-dashboard/pull/38
Public dash origin perhaps?
https://github.com/e-mission/e-mission-docs/issues/602

Jupyter notebook docker
https://github.com/e-mission/em-public-dashboard/issues/81

AWS Codebuild
https://github.com/e-mission/em-public-dashboard/pull/56

Docker port conflict
https://github.com/e-mission/em-public-dashboard/pull/60

Dockerfile location
https://github.com/e-mission/em-public-dashboard/pull/62

Docker image bump up
https://github.com/e-mission/em-public-dashboard/pull/87

Dockerfile vs Dockerfile dev moved
https://github.com/e-mission/em-public-dashboard/pull/56
https://github.com/e-mission/em-public-dashboard/commit/513e3824d4e98144507a9126a04d78f167a2c86c
Move dashboard to NREL template
https://github.com/e-mission/em-public-dashboard/pull/50

Http-server vulnerabilities
https://github.com/e-mission/em-public-dashboard/pull/58

Http-server global install
https://github.com/e-mission/em-public-dashboard/pull/59

Vulnerabilities - tests removed
https://github.com/e-mission/em-public-dashboard/pull/64
4. op-admin-dash Dev branch usage origin perhaps?
https://github.com/e-mission/e-mission-docs/issues/859
Finalize production docker build
https://github.com/e-mission/op-admin-dashboard/issues/32

sed to jq issue
https://github.com/e-mission/e-mission-docs/issues/595
https://github.com/e-mission/e-mission-docs/issues/714#issuecomment-1082027932
5. nrelopenpath Upgrade server base image
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/pull/19

Push remind command; Jianli, AWS
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/pull/26

Test CI Server Images
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/pull/33

Jenkins deployment issues
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/issues/9
Analysis pipeline not running
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/issues/10

Intake pipeline not running
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/issues/38
Download data from NREL network
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/issues/11
6. nrelopenpath-study-join Vulnerabilities, Flask-caching
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-study-join-page/issues/13

Join page docker container - origin perhaps?
https://github.com/e-mission/e-mission-docs/issues/784
Http-server global install
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-study-join-page/pull/2

Public-dash link; staging, production environments
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-study-join-page/pull/12
7. nrelopenpath-public-dash Notebook image origin perhaps?
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-public-dashboard/pull/1
8. nrelopenpath-admin-dash Admin image not building - prod data issue
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/issues/13
Consistent containers
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/3

sed to jq
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/4

nginx, dash, working, error debug
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/issues/9

Pem file added in Dockerfile
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/6

Main to Dev branch origin perhaps?
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/7
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/8
Hack to load data
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/pull/11
9. e-mission-docker CI image used instead of dev-server
https://github.com/e-mission/e-mission-docker/pull/24

ARGS used
https://github.com/e-mission/e-mission-docker/pull/25
Stripped down docker images, PWD
https://github.com/e-mission/e-mission-docker/pull/3

Pull repo instead of image
https://github.com/e-mission/e-mission-docker/pull/11

Install emission no image
https://github.com/e-mission/e-mission-docker/pull/13
https://github.com/e-mission/e-mission-docker/pull/14

Devapp phone UI notebook like separate docker image
https://github.com/e-mission/e-mission-docker/pull/22

Multi-tier docker compose, cronjob
https://github.com/e-mission/e-mission-docker/pull/4
https://github.com/e-mission/e-mission-docs/issues/410

UI Devapp standard scripts
https://github.com/e-mission/e-mission-docker/pull/18
MukuFlash03 commented 4 months ago

Topic 2: Learnings from Documentation across OpenPATH project

While it’s been a lot of information to digest, we’ve gained a much better understanding now of the history behind OpenPATH deployment and have compiled an expansive list of issues that we believe relate to this. Perhaps not all of the below learnings may be relevant to the redesign task but were somewhat related and may have reasonings behind the way we have built our architecture currently.

Some knowledge points from our exploration are mentioned below:


1. Admin-dashboard A. Switch to Dev from Main [1]

B. Wrappers added in Dockerfile (for other repos too) [1]


2. Join A. Sebastian, found the need to create a docker container ; no longer a redirect to CMS [1]


3. Public-dash

A. Basic understanding of public-dash images [1]

”It essentially provisions two docker containers: one for generating the static images (that we basically run notebooks using a cronjob to generate) and a dead simple frontend using jquery that basically displays the image.”

B. AWS Codebuild [1]

C. Switched to pinned notebook Image [1, 2]

D. Point where we switched to using notebook image instead of server image [1]

These public-dash changes that commented out code (used in external and ran notebooks directly for generating plots) were all done around the same time - Sep 2022. This also coincides with the time when the public-dash base image was changed to using the docker hub notebook image instead of server image. So all of this seems to be a part of the hack used by Shankari and Jianli to make sure the latest deployment back then went through successfully.


4. e-mission-server

A. CI publish multiple images from multiple branches + Image_build_push.yml origin [1, 2]

Most important issue as this is very close to the current task we’re working on that highlights the redesign needed.

Looks like we also went from a monorepo / monolithic design to a split microservices design, especially for the server code:

B. Learnt about the origin of Dockerhub usage + Automatic image build

C. Travis CI was tested by Shankari [1]


D. AWS Costs Detailed Discussion [1]

E. Why or why not to use Dockerhub (costs, free tier, images removal) [1]

Dockerhub resources: [pricing, 6 months policy, policy delayed, retention policy + docker alternatives]

F. MongoDB to AWS DocumentDB switch [1]


5. Nrelopenpath [INTERNAL]


6. e-mission-docker

A. Origin of multi-tier Docker compose [1]

nataliejschultz commented 4 months ago

Topic 3: Questions for the Cloud Team + Key takeaways

We have been collaborating with cloud services back and forth.

Q1:

We understand the current process to build and deploy images looks like this: a. Build from Dockerfile in external repos and pushed to Dockerhub b. Rebuild in Jenkins based on modified Dockerfiles in internal repos c. Push to AWS ECR

Could the possible process be changed to only build once externally like this: a. Build from Dockerfile in external repos and pushed to Dockerhub b. Runs Jenkins pipeline which pulls images directly from Dockerhub (and not rebuild) c. Push to AWS ECR

A1:

Security is the most important reason that NREL does not pull EXTERNAL images and deploys it direct to NREL infrastructures. That's why Cloud build central ECRs for all cloud project images stored on AWS, all images needs to be (regularly) scanned by Cyber.


Q2: Is one of the reasons we need to modify the images due to using confidential credentials to access AWS Cognito (ie nrel-cloud-computing/ nrelopenpath-admin-dashboard/docker-compose-prod.yml)?

A2:

Cloud team is not using docker-compose for deploy. The base image is pulled from EXTERNAL, i.e. your DockerHub, we wrapped the image to fit into AWS ECS deployment. For example, https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/blob/master/webapp/Dockerfile, we need to install certificates for your app to access DocumentDB cluster.


Q3: How and when is nrelopenpath/buildspec.yml run? Is it run multiple times in the pipeline for the staging and production deployments?

A3:

The CodeBuild (buildspec.yaml) run will be triggered through Jenkins. On Jenkins job, there's checkbox to let you choose if building image or not. If not, then it'll search the image built in most recent time. If yes, it'll run build the image before the deployment, it builds once and deploy to multiple productions.


Q4: How and when are the images for the other repos(ie public dash, admin dash, join) built? How are they run if they're not in buildspec.yml as we only see "webapp" and "analysis" in here?

A4:

Same as what's in 3, if you choose build app images, then all public, join, admin, web/analysis images would be re-built. Shankari would like the Jenkins job to be simple, that's why, we use on button/checkbox to control everything under the hood.


Q5: Regarding this commit, what was the reasoning behind running the .ipynb notebooks directly?

A5:

We run the .ipynb notebooks directly, it's because previous code was running this with crontab. But in AWS, we are not using crontab to schedule tasks, we are using ECS Scheduled Tasks, basically we created AWS EventBridge rules to schedule viz scripts run with a cron expression. As we are using the Docker container, the container could fail, if it failed, then the crontab would not run within container. That's why we choose use AWS to schedule and run cron job as container.


Q6: Assuming there are not a lot of such wrapping images tasks, it seems like these tasks like adding certificates can be moved to the EXTERNAL Dockerfile. If so, that would mean we no longer require the INTERNAL Dockerfile and would not need build the docker image INTERNALLY. We would still be pushing to AWS ECR?

A6:

Yes, Cloud has built pipeline required by Cyber, where it scans all images from NREL ECR regularly and reports vulnerabilities/KEVs. Which means the pipeline could not pull and scan external images, it requires all images built and pushed to NREL ECR as the central repos.


Q7: For more clarity, in the buildspec.yml file (https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/blob/master/buildspec.yml), we currently have four phases: install, pre-build, build, post-build. Would it be possible, to just have three phases: install, pre-build, post-build; i.e. skipping the build stage?

Instead, in the post-build stage, just before “docker push” to AWS ECR, we can pull the required image from Dockerhub, tag it, then push to AWS ECR. Thus we would not require to rebuild it, again assuming that we do not have any wrappers to be applied (these can be applied EXTERNALLY itself).

A7:

It sounds doable, you could update and give a try, as long as there's no NREL specific info expose to external.


Some key takeaways:

  1. It is incredibly important that we store the images that are built during the Jenkins run in AWS ECR. This allows for regular, on-demand vulnerability scanning of images by Cyber.
  2. Some differences external and internal images include: the addition of certificates required to access the DocumentDB cluster, image customizations by changing certain directories (e.g. conf settings).
  3. Buildspec.yml is triggered through Jenkins. Whoever runs Jenkins can choose which images are re-built or not by checking a box.
nataliejschultz commented 4 months ago

Topic 4: Understanding of the internal repos (Need + Rebuild)

Two questions to answer and we’ve put forth our points having discussed with Jianli.

1. Do we need internal repos at all? Why? Yes. We do need them.


2. Do we need to re-build images both externally and internally (two sets of Dockerfiles for each repo)

A. Possibly not needed to rebuild:

B. Possibly need to rebuild:


To summarize:

nataliejschultz commented 4 months ago

Topic 5: Redesign Plan Updates

Redesign steps suggested in previous meeting:

1. Add all four repos to the Multi-tier docker structure

A. Current setup

B. New setup

C. Feedback from previous meeting:


2. Proposed deployment process

We are still considering the one internal repository structure (mentioned above in 1.) with related files for each repo inside just one subdirectory per repo. So, all four repos would have a ready-to-use image pushed to Dockerhub.

A. Skipping the “build” job:

B. Streamlining repo-specific build processes

i. E-mission-server:

ii. Join

iii. Admin-dash:

iv. Public-dash:

Two possibilities:

  1. Similar to how cronjobs are being handled in nrelopenpath/analysis, we can rebuild the image and push it to AWS ECR.
    • This means, we build image externally, then use the internal Dockerfile to make customizations to include scripts to run the python notebooks without using crontab.
    • This would avoid manually re-tagging the public-dashboard notebook image and manually pushing it to Dockerhub.
    • We would still be building image internally from the latest server image.
  2. But, the question is, do we even need to run these notebooks on a schedule?
    • One reason is, when analysis scripts are run as cronjobs, we are updating dashboard based on their results.
    • If not, can we just use the static images that were generated from the external build.
shankari commented 4 months ago

One high-level comment before our next meeting:

Conf directories are different in internal and external repos as well as in webapp and analysis internally.

Do we need all these conf directories? A lot of them have a single configuration. See also https://github.com/e-mission/e-mission-server/pull/959#issuecomment-1975778952

MukuFlash03 commented 4 months ago

Table for Differences in External and Internal Repositories

S.No. Repository/Container File Difference Findings Solution Needs Rebuilding Internally (Y/N)
1. Join N/A None N/A Internal matches External. No
2. Admin-dash docker/start.sh sed changed to jq Tested changing sed to jq for both admin-dash and public-dash.
Changed in script, rebuilt containers, working with jq.
Change sed to jq in external repos. No
docker/Dockerfile AWS Certificates Cannot move outside since this customization is just for our use case as we use AWS DynamoDB.
What if someone else is using other DBs like MongoDB or other cloud services.
Natalie has an idea to use environment variables and a script that runs as a Dockerfile layer.
Keep it as it is / Natalie script. Yes
docker-compose-prod.yml Cognito credentials added. For internal Github repo, these will be stored as secrets.
These will be set up as ENV variables in the docker-compose same as current setup but will use secrets instead of directly setting values.
Use GitHub secrets + Environment variables. Yes
config.py, config-fake.py INDEX_STRING_NO_META added. We searched for history behind this addition and found that it was done as a workaround to handle a security issue with a script injection bug.
We found that Shankari had filed an issue with the dash library repository and was able to elicit a response and a fix from the official maintainers of the library.

With the dash library version that fixed it 2.10.0, flask version <= 2.2.3 is needed.
Hence choosing next higher versions (2.14.1 which increased flask version limit), 2.14.2 (mentioned by a developer that it works).
Working with 2.16.1 latest version as well.
The issue no longer appears when tested with versions: 2.14.1, 2.14.2, 2.16.1.

Shankari updates
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/23930b6687c6e2e8cd4aeb79d3181fc7af065de6
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/c40f0866c76e2bfa03bef05d0daefda30625a943
https://github.com/plotly/dash/issues/2536
https://github.com/plotly/dash/pull/2540

Related:
https://github.com/plotly/dash/issues/2699
https://github.com/plotly/dash/issues/2707 [2.14.2 works]

Release Tags
https://github.com/plotly/dash/releases/tag/v2.10.0 [Contains fix for Shankari’s issue]
https://github.com/plotly/dash/releases
Upgrade Dash library version to >= 2.14.1 or to latest (2.16.1) in requirements.txt No
app_sidebar_collapsible.py OpenPATH logo removed.
Config file import added.
INDEX_STRING_NO_META added.
Except OpenPATH logo, others can be skipped.
Need more info on whether OpenPATH logo can be added or kept as it is from external version or not.
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-admin-dashboard/commit/47f023dc1d6b5a6531693a90e0575830691356ec
https://github.com/e-mission/op-admin-dashboard/issues/43

Config import and INDEX_STRING_NO_META can be removed as I’ve tested that updating the dash version to version 2.16.1 and above solves the issue and we no longer need a workaround.
Decide on whether OpenPATH logo can be added to internal.
Based on Shankari’s suggestions in the open issue, we believe it’d be a good idea to store the icon as a static image (png or svg) in the local file system.

Config import and INDEX_STRING_NO_META can be removed after Dash version upgrade.
No
3. Public-dash start_noteboook.sh sed changed to jq Tested changing sed to jq for both admin-dash and public-dash.
Changed in script, rebuilt containers, working with jq.
Change sed to jq in external repos. No
docker/Dockerfile AWS Certificates Cannot move outside since this customization is just for our use case as we use AWS DynamoDB.
What if someone else is using other DBs like MongoDB or other cloud services.
Natalie has an idea to use environment variables and a script that runs as a Dockerfile layer.
Keep it as it is / Natalie script. Yes
start_noteboook.sh Python notebooks split into multiple execution calls. Related to AWS ECS scheduled tasks being used instead of cron jobs. Natalie’s suggestion: event-driven updates Yes
4. Nrelopenpath
ENV variables
analysis/conf/net/ext_service/push.json Four key-value pairs containing credentials, auth tokens. Need to test if we can we pass the entire dictionary as an environment variable?
If yes, avoids having to create 4 different ENV variables and can just use one for the entire file.
Use environment variables. No
webapp/conf/net/auth/secret_list.json One key-value pair containing credentials, auth tokens. Need to test if we can we pass the entire list as an environment variable?
File history
https://github.com/e-mission/e-mission-server/pull/802

Hardcoded now, switch to some channel later
https://github.com/e-mission/e-mission-docs/issues/628#issuecomment-799828333
Use environment variables. No
5. Nrelopenpath/analysis
CONF Files
conf/log/intake.conf Logging level changed. Debug level set in internal while Warning level set in external.
Debug is lower priority than Warning.
This means internally, we want to log as much as possible.

Mentioned here that this was a hack, not a good long-term solution.
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/6dfc8da32cf2d3b98b200f98c1777462a9194042

Same as log/webserver.conf in webapp.
Just differs in two filenames which are log file locations.
Keep it as it is.
Decide on level of logging and details.
Yes
conf/analysis/debug.conf.json Three key-value pairs. Analysis code config values, keys. Keep it as it is. Yes
6. Nrelopenpath/webapp
CONF Files
conf/log/webserver.conf Logging level changed. Debug level set in internal while Warning level set in external.
Debug is lower priority than Warning.
This means internally, we want to log as much as possible.

Mentioned here that this was a hack, not a good long-term solution.
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/6dfc8da32cf2d3b98b200f98c1777462a9194042

Same as log/intake.conf in analysis.
Just differs in two filenames which are log file locations.
Keep it as it is.
Decide on level of logging and details.
Yes
conf/analysis/debug.conf.json Nine key-value pairs. Analysis code config values, keys. Keep it as it is. Yes
conf/net/api/webserver.conf.sample 2 JSON key-value pairs.
1 pair removed.
Related to auth (2 pairs).

Looks like 404 redirect was added only in external and not in internal.
Found this commit for addition of 404 redirect in external:
https://github.com/e-mission/e-mission-server/commit/964ed288032262e1bedc945b24c04b192114aec5

Why sample file used + skip to secret
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/a66a52a8fbd28bb13c4ebad445ba21e8b478c105

Secret to skip
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/cb4f54e4c201bd132c311d04a5e34a57bae2efb7

Skip to dynamic
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath/commit/52392a303ea60b73e8ea23a199b09778b306d8da
Keep it as it is. Should the redirect be added to internal as well? Yes
7. Nrelopenpath/analysis
Startup Scripts
cmd_intake.sh
cmd_push_remind.sh
cmd_build_trip_model.sh
cmd_push.sh
cmd_reset_pipeline.sh
One external startup script duplicated into 5 scripts. Crontab and start_cron.sh no longer used as they are not used in NREL-hosted production environments
https://github.nrel.gov/nrel-cloud-computing/nrelopenpath-public-dashboard/commit/2a8d8b53d7ecf24e85217fb9b812ef3936054e35.
Instead, scripts are run directly in Dockerfile as ECS can create scheduled task for cron.
TBD

Have a single script that runs these scripts based on parameters.
But how will ECS know when to run each script?
Can we execute these in Dockerfile like public-dash currently runs Python notebooks (bad hack!?)
Yes
8. Nrelopenpath/webapp
Startup Scripts
start_script.sh Additional command to copy conf files added. Need to understand what is being copied where.
Tested and saw that custom conf files added or copied over to conf directory of base server image.
Keep it as it is. Yes
shankari commented 4 months ago

High-level thoughts:

Microscopic details:

nataliejschultz commented 3 months ago

Current dealings:

@MukuFlash03 and I have been collaborating on minimizing the differences between the internal and external repositories.

I have been working on figuring out how to pass the image tag – created in the server action image-build-push.yml – between repositories. I’ve been attempting to use the upload-artifact/download-artifact method. It worked to upload the file, and we were able to retrieve the file in another repository, but we had to specify the run id for the workflow where the artifact was created. So, this defeats the purpose of automating the image tag in the first place.

We also looked into GitHub release and return dispatch as options, but decided they were not viable.

There are ways to push files from one run to another repository, though we haven’t tried them yet. Write permissions might be a barrier to this, so creating tokens will be necessary. If we can get the file pushing to work, this is our intended workflow:

  1. E-mission-server image-build-push.yml: writes image tag to file
  2. Tag file is pushed to admin dash, and public dash
  3. Push of file triggers image-build-push workflows in the other repos
  4. File is read into image-build-push workflow
  5. Tag from file set as an environment variable for workflow run
  6. Dockerfiles updated with tags
  7. Docker image build and push
MukuFlash03 commented 3 months ago

Created PR for implementation code changes here:

A. External PRs:

B. Internal PR

nataliejschultz commented 2 months ago

Had a meeting with @MukuFlash03 to discuss some issues with testing. Made a plan for documentation of testing and outlined what all needs to be done.

MukuFlash03 commented 2 months ago

Docker Commands for Testing Code Changes

Posting a list of the docker commands I used to verify whether the docker images were building successfully. Next, I also tested whether containers could be run from the images.

I had to ensure that the configurations setup in the docker-compose files were set manually by me since in the internal images docker-compose is not needed used any more. These settings included things like ports, networks, volumes, environment variables.


Creating a network so containers can be connected to each other:

$ docker network create emission

DB container is needed for storage; data must be loaded into it (I did not load data when I did this testing initially)

$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0

A. Internal Repo Images

Checkout to multi-tier branch in internal repo -> nrelopenpath


  1. Webapp
$ docker build -t int-webapp ./webapp/
$ docker run --name int-webapp-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage_program --network emission int-webapp

  1. Analysis
$ docker build -t int-analysis ./analysis/
$ docker run --name int-analysis-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage_program -e PUSH_PROVIDER="firebase" -e PUSH_SERVER_AUTH_TOKEN="Get from firebase console" -e PUSH_APP_PACKAGE_NAME="full package name from config.xml. e.g. edu.berkeley.eecs.emission or edu.berkeley.eecs.embase. Defaults to edu.berkeley.eecs.embase" -e PUSH_IOS_TOKEN_FORMAT="fcm" --network emission int-analysis

  1. Join
$ docker build -t int-join ./join_page
$ docker run --name int-join-1 -d -p 3274:5050 --network emission int-join

Sometimes, during local testing, join and public-dash frontend page might load the same html file as the port is still mapped to either join or public-dash depending on which is run first. So, change the join port to a different one (just for testing purposes). :

$ docker run --name int-join-1 -d -p 2254:5050 --network emission int-join

  1. Admin-dash
$ docker build -t int-admin-dash ./admin_dashboard
$ docker run --name int-admin-dash-1 -d -e DASH_SERVER_PORT=8050 -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e CONFIG_PATH="https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/" -e STUDY_CONFIG="stage-program" -e DASH_SILENCE_ROUTES_LOGGING=False -e SERVER_BRANCH=master -e REACT_VERSION="18.2.0" -e AUTH_TYPE="basic" -p 8050:8050 --network emission int-admin-dash

  1. Public-dash frontend
$ docker build -t int-public-dash-frontend ./public-dashboard/frontend/
$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission int-public-dash-frontend

  1. Public-dash viz_scripts
$ docker build -t int-public-dash-notebook ./public-dashboard/viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission int-public-dash-notebook


B. External Repo Images

  1. Directly pull latest pushed image from Dockerhub for each repo in its Dockerfile

    docker run --name container_name image_name 
  2. Alternatively, can try building from external repo after switching to consolidate-differences branch for e-mission-server and image-push branch for join, admin-dash, public-dash. Will have to use docker build commands similar to internal images above.

    docker build -t image_name Dockerfile_path
    docker run —name container_name image_name
  3. Or, can run docker-compose commands since external images still have docker compose files.

Join and Public-dash:  $ docker-compose -f docker-compose.dev.yml up -d
Admin-dash: $ docker compose -f docker-compose-dev.yml up -d

Initially, I used option 1.


  1. E-mission-server

    $ docker run --name em-server-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program --network emission mukuflash03/e-mission-server:image-push-merge_2024-04-16--49-36
  2. Join

    $ docker run --name join-2 -d -p 3274:5050 --network emission mukuflash03/nrel-openpath-join-page:image-push-merge_2024-03-26--22-47
  3. Op-admin

    $ docker run --name op-admin-2 -d -e DASH_SERVER_PORT=8050 -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e CONFIG_PATH="https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/" -e STUDY_CONFIG="stage-program" -e DASH_SILENCE_ROUTES_LOGGING=False -e SERVER_BRANCH=master -e REACT_VERSION="18.2.0" -e AUTH_TYPE="basic" -p 8050:8050 --network emission mukuflash03/op-admin-dashboard:image-push-merge_2024-04-16--00-11
  4. Public-dash Frontend / dashboard

    $ docker run --name public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission mukuflash03/em-public-dashboard:image-push-merge_2024-04-16--59-18
  5. Public-dash Viz_scripts / notebook-server

    $ docker run --name public-dash-viz-1 -d -e DB_HOST=db -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission mukuflash03/em-public-dashboard_notebook:image-push-merge_2024-04-16--59-18
shankari commented 2 months ago

For automatic updates of the tags, we have three options:

  1. reading the tag automatically from a GitHub action in another repo (not clear that this works)
    • how can you pass a run_id between repositories?
  2. pushing a file from a github action in one repo to a github action in another repo (untried)
  3. use Github hooks and/or the GitHub API to be notified when there is a new release and to trigger a workflow based on that (untried)
nataliejschultz commented 2 months ago

For automatic updates of the tags, we have three options:

  1. pushing a file from a github action in one repo to a github action in another repo (untried)

@MukuFlash03 See my comment above, quoted below, for an outline of the steps to try to get the file pushing method to work:

There are ways to push files from one run to another repository, though we haven’t tried them yet. Write permissions might be a barrier to this, so creating tokens will be necessary. If we can get the file pushing to work, this is our intended workflow:

  1. E-mission-server image-build-push.yml: writes image tag to file
  2. Tag file is pushed to admin dash, and public dash
  3. Push of file triggers image-build-push workflows in the other repos
  4. File is read into image-build-push workflow
  5. Tag from file set as an environment variable for workflow run
  6. Dockerfiles updated with tags
  7. Docker image build and push
shankari commented 2 months ago

There is also https://stackoverflow.com/questions/70018912/how-to-send-data-payload-using-http-request-to-github-actions-workflow which is a GitHub API-fueled approach to passing data between repositories

MukuFlash03 commented 2 months ago

Summary of approaches tried for automating docker image tags

Requirements:

  1. Docker image tags generated in e-mission-server should be available in admin-dash and public-dash repository workflows.
  2. Successful completion of docker image workflow run in e-mission-server should trigger workflows to build and push docker images in admin-dash and public-dash repositories.
  3. Dockerfiles in admin-dash and public-dash should be updated with the latest docker image tags received from the latest successfully completed "docker image" workflow run of e-mission-server.

Notes:

Current status:

Implemented

For reference, matrix strategy for workflow dispatch events

Pending


Approaches planned to try out:

  1. Artifacts upload / download with the use of workflow run ID.
  2. Pushing files from one repo to the other repos.
  3. Using GitHub REST APIs and webhooks to trigger workflows

Approaches actually tested, implemented and verified to run successfully

  1. Artifacts + Run ID
  2. GitHub REST APIs + webhooks

Reason for not trying out Approach 2:

I tried out and implemented Approach 1 and 3 first. Approach 3 was necessary for triggering a workflow based on another workflow in another repository. Approach 1 sort of included Approach 2 of pushing files in the form of artifacts. Both Approach 1 and 2 would need Approach 3 to trigger workflows in the dashboard repos at the same time. With these two approaches implemented, I was done with Requirements 1) and 2).

The next major task was to work on Req 3) which involves updating the Dockerfiles in the dashboard repos. Approach 2 is somewhat related to this as physical files present in the actual repos will need to be modified, committed. This is in contrast to any files, text data passed around in Approaches 1 and 3, which was all being done inside the GitHub actions workflow runner. The artifact files were available outside the runner after its execution but they were still tied to the workflow run. With Approach 2, and in completing Req. 3, I would need to handle the Dockerfiles outside the workflow runs, hence I skipped Approach 2 as in a way I'd be working on it anyways.

MukuFlash03 commented 2 months ago

Details of approaches

In my forked repositories for e-mission-server, join, admin-dash, public-dash there are three branches available for the tags automation: tags-artifact, tags-dispatch, tags-matrix.

tags-artifact branch in: e-mission-server, admin-dash, public-dash, join

tags-dispatch branch in: e-mission-server, admin-dash, public-dash, join

tags-matrix branch in: e-mission-server, admin-dash, public-dash, join

Approach 1: tags-artifact: Approach 3: tags-dispatch, tags-matrix


  1. tags-artifact [Approach 1: Artifact + Run ID]
    • Official documentation: upload-artifact, download-artifact
    • This involves using the artifact upload and download GitHub actions to make any file generated inside the workflow run available outside the runner execution but still inside the workflow as a downloadable .zip file.
    • This file can then be downloaded in another repository using a personal access token with repo access permissions, the workflow run id, the source repository name with user or organization name.
    • The workflow run ID was an obstacle, as it was unclear how to fetch it automatically.
      • I was finally able to fetch it using a Python script which uses a GitHub REST API endpoint to fetch all runs of a workflow.
      • Then I filtered these by status (completed + success) and source repo's branch name.
      • Finally these filtered runs are sorted in descending order of last updated time which gives the latest workflow run ID in e-mission-server.

Cons:


  1. tags-dispatch [Approach 3: GitHub REST APIs]
    • Official documentation: workflow dispatch events
    • This uses a GitHub REST API endpoint for the workflow dispatch events which sends POST requests to a target repository to trigger workflows in those repositories.
    • This required usage of a fine-grained token with the required access scope was actions: write.
    • Additionally, this can also pass data such as the docker image tags in the form of json data via the request parameters, which can then be received by the target repositories by accessing the response from the POST request.
    • name: Trigger workflow in join-page, admin-dash, public-dash run: | curl -L \ -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer ${{ secrets.GH_FG_PAT_TAGS }}" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/repos/MukuFlash03/op-admin-dashboard/actions/workflows/90180283/dispatches \ -d '{"ref":"tags-dispatch", "inputs": {"docker_image_tag" : "${{ steps.date.outputs.date }}"}}'

Similarly for public-dash, join


---------

3. tags-matrix [Approach 3: GitHub REST APIs]
- Official documentation: [matrix strategy](https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs)
- Similar to tags-dispatch, wherein workflow dispatch events are used.
- The only difference is that the matrix strategy is to dispatch parallel events to the target repositories once the source repository (e-mission-server), successfully completes execution.
- This also shows the dispatch events as a 2nd set of combined jobs in the workflow run graph.
- -  This required usage of a fine-grained token with the required access scope was `actions: write`.
strategy:
  matrix:
    repo: ['MukuFlash03/nrel-openpath-join-page', 'MukuFlash03/op-admin-dashboard', 'MukuFlash03/em-public-dashboard']

- name: Trigger workflow in join-page, admin-dash, public-dash
  run: |
    curl -L \
      -X POST \
      -H "Accept: application/vnd.github+json" \
      -H "Authorization: Bearer ${{ secrets.GH_FG_PAT_TAGS }}" \
      -H "X-GitHub-Api-Version: 2022-11-28" \
      https://api.github.com/repos/${{ matrix.repo }}/actions/workflows/image_build_push.yml/dispatches \
      -d '{"ref":"tags-matrix", "inputs": {"docker_image_tag" : "${{ env.DOCKER_IMAGE_TAG }}"}}'


-------

Pros of workflow dispatch events and matrix strategy :
- An advantage of using the workflow dispatch events is that we do not need metadata like the run ID.
- There is an option to use workflow ID but is can also be replaced by the workflow file name; hence even workflow ID isn't needed. 
- I did calculate the workflow ID for each workflow file "image-build-push.yml" in the target repositories by using these API endpoints: [e-mission-server](https://api.github.com/repos/MukuFlash03/e-mission-server/actions/workflows), [join workflows](https://api.github.com/repos/MukuFlash03/nrel-openpath-join-page/actions/workflows), [admin-dash workflows](https://api.github.com/repos/MukuFlash03/op-admin-dashboard/actions/workflows), [public-dash](https://api.github.com/repos/MukuFlash03/em-public-dashboard/actions/workflows)
- I have kept the same workflow file name for the target repositories and hence in the tags-matrix repo, I can simply use the same workflow name but with differing repository names as defined in the matrix to run the curl command for the all the repositories.
shankari commented 2 months ago

The next major task was to work on Req 3) which involves updating the Dockerfiles in the dashboard repos.

I don't think that the solution should be to update the Dockerfiles. That is overkill and is going to lead to merge conflicts. Instead, you should have the Dockerfile use an environment variable, and then set the environment variable directly in the workflow or using a .env file

nataliejschultz commented 2 months ago

Instead, you should have the Dockerfile use an environment variable, and then set the environment variable directly in the workflow or using a .env file

Our primary concern with this method was for users building locally. Is it acceptable to tell users to copy the latest image from the docker hub image library in the README?

shankari commented 2 months ago

docker-compose can use local images

The comment around .env was for @MukuFlash03's task to update the tag on the Dockerfile, which is not related to testing, only for the GitHub triggered actions.

MukuFlash03 commented 2 months ago

Docker tags automation working end-to-end!

Finally got the tags automation to work completely in one click starting from the e-mission-server workflow, passing the latest timestamp used as the docker image tag suffix and then triggering the workflows in admin-dashboard and public- dashboard.

Final approach taken for this involves a combination of the artifact and the matrix-dispatch methods discussed here.

Additionally, as suggested by Shankari here, I changed the Dockerfiles to use environment variables set in the workflow runs itself. Hence, not using / updating hardcoded timestamp values in the Dockerfiles anymore.

I don't think that the solution should be to update the Dockerfiles. That is overkill and is going to lead to merge conflicts. Instead, you should have the Dockerfile use an environment variable, and then set the environment variable directly in the workflow or using a .env file.


There is still a manual element remaining, however this is to do with any users or developers looking to work on the code base locally with the dashboard repositories. The docker image tag (only the timestamp part) will need to be manually copied from the latest Dockerhub image of the server repo and added to the args in the docker-compose files.

This is also what @nataliejschultz had mentioned here:

Our primary concern with this method was for users building locally. Is it acceptable to tell users to copy the latest image from the docker hub image library in the README?

# Before adding tag
    build:
      args:
        DOCKER_IMAGE_TAG: ''

# After adding tag
    build:
      args:
        DOCKER_IMAGE_TAG: '2024-05-02--16-40'
MukuFlash03 commented 2 months ago

Implementation Approaches discussed here.

Combined approach (artifact + matrix) tags-combo-approach branch: e-mission-server, admin-dash, public-dash


Successful workflow runs:

  1. Workflow dispatch on modifying code in server repo:
server_push
admin_workflow_dispatch
public_workflow_dispatch
  1. Push event on modifying code in admin-dash or public-repo
admin_push
public_push
MukuFlash03 commented 2 months ago

I decided to go ahead with the matrix-build strategy which dispatches workflows to multiple repositories when triggered from one source repositories. I had implemented this in tags-matrix branches of the dashboard repos (join repo as well, but this was just for initial testing purposes; final changes only on the dashboard repos).

Initially, I only had a push event trigger, similar to the docker image build and push workflow in the server repo. However, I realized that there would now be two types of Github actions events that should trigger the workflows in the admin-dashboard and public-dashboard repos. The second type of trigger would be a workflow_dispatch event. This was implemented and working via the matrix-build workflow dispatch branch.

Now, for the workflow dispatch event, I was able to pass the latest generated docker image timestamp directly via the e-mission-server workflow in the form of an input parameter docker-image-tag.

    - name: Trigger workflow in admin-dash, public-dash
      run: |
        curl -L \
          -X POST \
          -H "Accept: application/vnd.github+json" \
          -H "Authorization: Bearer ${{ secrets.GH_FG_PAT_TAGS }}" \
          -H "X-GitHub-Api-Version: 2022-11-28" \
          https://api.github.com/repos/${{ matrix.repo }}/actions/workflows/image_build_push.yml/dispatches \
          -d '{"ref":"tags-combo-approach", "inputs": {"docker_image_tag" : "${{ env.DOCKER_IMAGE_TAG }}"}}'

This parameter was then accessible in the workflows of the dashboard repos:

on:
  push:
    branches: [ tags-combo-approach ]

  workflow_dispatch:
    inputs:
      docker_image_tag:
        description: "Latest Docker image tags passed from e-mission-server repository on image build and push"
        required: true
MukuFlash03 commented 2 months ago

With these changes done, I believed I was done but then I came across some more issues. I have resolved them all now but just mentioning them.


  1. If push event triggers workflow, empty string value was being passed into the ENV variable.
    • This was solved by introducing the artifact method discussed in the comment above

Why I chose to add artifact method as well?

The issue I was facing was with fetching the latest timestamp for the image tag in case of a push event trigger. This is because in the workflow dispatch, the server workflow itself would trigger the workflows and hence was in a way connected to these workflows. However, push events would only trigger the specific workflow in that specific dashboard repository to build and push the image and hence would not be able to retrieve the image tag directly.

So, I utilized the artifact upload and download method to:


  1. There are three jobs in the workflows in the dashboard repo: fetch_run_id, fetch_tag, build. Fetch_run_id must always run and complete before build job begins; but build was finishing first and building images with the incorrect image tag as it wasn't yet available since the fetch jobs weren't completed.
    • Solution involved usage of needs keyword to create chained jobs that are dependent on each other and will always wait for the previous task to complete before executing.
    • Additionally, of output variables and Environment variables were used in the workflow to pass values from one job to the other.

  1. Switching to using ARG environment variables in the Dockerfiles was tricky as I had to figure out how to pass the appropriate timestamp tags considering the two event triggers - push and workflow_dispatch.

Dockerfiles' FROM layer looks like:

ARG DOCKER_IMAGE_TAG
FROM mukuflash03/e-mission-server:tags-combo-approach_${DOCKER_IMAGE_TAG}

Solution I implemented involves defining two DOCKER_IMAGE_TAGS in the workflow file, one for push, the other for workflow_dispatch:

    env:
      DOCKER_IMAGE_TAG_1: ${{ needs.fetch_tag.outputs.docker_image_tag }}
      DOCKER_IMAGE_TAG_2: ${{ github.event.inputs.docker_image_tag }}

I then passed either of these as the --build-arg for the docker build command depending on the event trigger:

    - name: build docker image
      run: |
        if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
          docker build --build-arg DOCKER_IMAGE_TAG=$DOCKER_IMAGE_TAG_2 -t $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} .
        else
          docker build --build-arg DOCKER_IMAGE_TAG=$DOCKER_IMAGE_TAG_1 -t $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} .
        fi

  1. How to update docker image tags in case developers wanted to build locally? Solution was to provide an option to add the docker image tag manually from the latest server image pushed to Dockerhub. This is discussed in this comment above.

The ReadMe.md can contain information on how to fetch this tag, similar to how we ask users to manually set their study-config, DB host, server host info for instance.

shankari commented 2 months ago

wrt merging, I am fine with either approach

  1. put both the automated builds + code cleanup into one PR and the cross-repo automated launches into another PR. You can have the second PR be targeted to merge to the first PR so that it only has the changes that are new for the second level of functinality
  2. One giant PR. I looked at join and I don't think that the changes will be that significant, and I am OK with doing a more complex review if that is easier for you.
MukuFlash03 commented 2 months ago

Build completely automated !

No manual intervention required; not even by developers using code

Referring to this comment:

There is still a manual element remaining, however this is to do with any users or developers looking to work on the code base locally with the dashboard repositories. The docker image tag (only the timestamp part) will need to be manually copied from the latest Dockerhub image of the server repo and added to the args in the docker-compose files.

I've gone ahead and implemented the automated build workflow with the addition of the .env file in the dashboard repos which just stores the latest timestamp from the last successfully completed server image.

Thus, the build is completely automated now and users / developers who want to run the code locally will not have to manually feed in the timestamp from the docker hub images.

The .env file will be updated and committed in the github actions workflow automatically and changes will be pushed to the dashboard repo by the github actions bot.


Links to successful runs

A. Triggered by Workflow_dispatch from e-mission-server Server run, Admin-dash run, Public-dash run

Automated commits to update .env file: Admin-dash .env, Public-dash .env


B. Triggered by push to remote dashboard repositories Admin-dash run, Public-dash run

Automated commits to update .env file: Admin-dash .env, Public-dash .env


MukuFlash03 commented 2 months ago

I also tested another scenario where let's say a developer changed the timestamp in the .env file to test an older server image. Now, they may have accidentally pushed this older timestamp to their own repo. What happens when they create a PR with their changes which includes this older server image?

Thus expected workflow steps in this case would be:


Some outputs from my testing of this scenario, where I manually entered an older timestamp (2024-05-02--16-40) but the workflow automatically updated to latest timestamp (2024-05-03--14-37).

A. Public-dash

mmahadik-35383s:em-public-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-02--16-40
mmahadik-35383s:em-public-dashboard mmahadik$ git pull origin tags-combo-approach
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 2 (delta 1), reused 2 (delta 1), pack-reused 0
Unpacking objects: 100% (2/2), 262 bytes | 65.00 KiB/s, done.
From https://github.com/MukuFlash03/em-public-dashboard
 * branch            tags-combo-approach -> FETCH_HEAD
   9444e60..40beb80  tags-combo-approach -> origin/tags-combo-approach
Updating 9444e60..40beb80
Fast-forward
 .env | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
mmahadik-35383s:em-public-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-03--14-37

B. Admin-dash

mmahadik-35383s:op-admin-dashboard mmahadik$ cat .env 
DOCKER_IMAGE_TAG=2024-05-02--16-40
mmahadik-35383s:op-admin-dashboard mmahadik$ git pull origin tags-combo-approach
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), 308 bytes | 77.00 KiB/s, done.
From https://github.com/MukuFlash03/op-admin-dashboard
 * branch            tags-combo-approach -> FETCH_HEAD
   d98f75c..f1ea34c  tags-combo-approach -> origin/tags-combo-approach
Updating d98f75c..f1ea34c
Fast-forward
 .env | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
mmahadik-35383s:op-admin-dashboard mmahadik$ cat .env
DOCKER_IMAGE_TAG=2024-05-03--14-37
MukuFlash03 commented 2 months ago

Also, added TODO s to change from my repository and branches to master branch and e-mission server repo.

nataliejschultz commented 2 months ago

@shankari on the internal repo:

Right now, for the server images that this PR is built on:

  • wait for the server build to complete
  • copy the server image tag using command+C
  • edit webapp/Dockerfile and analysis/Dockerfile and change the tag using command-V
  • commit the changes
  • push to the main branch of the internal repo
  • launch build using Jenkins

Ideally the process would be:

  • something something updates, commits and pushes the updated tags to the main branch of internal repo

    • it is fine for this to be a manual action, at least initially, but I want one manual action (~one button or~ one script)
    • creating a PR that I merge would be OK but sub-optimal. Short-term ideally, this would just push directly to the repo so no merge is required.
    • could this run in Jenkins? No visibility into Jenkins. We should write a script as a template for cloud services if this is even possible.
  • I manually launch build using Jenkins

Initial thoughts about a script:

Pull the image tags from the external repos (GitHub API?) Write those image tags into the Dockerfiles for each repository *Create a PR that's auto-merged, so the tags are ready to go for the Jenkins pipeline

Where to run?

shankari commented 1 month ago

I have created all the tokens needed; we just need to clean this up to a basic level and merge. Once we are done with the basic level, I think we will still need one more round of polishing for this task, but we can track that in a separate issue

Screenshot 2024-05-19 at 11 21 22 AM
shankari commented 1 month ago

I can see that we have used docker build and docker run directly both in the PRs and while testing them (e.g.) https://github.com/e-mission/e-mission-docs/issues/1048#issuecomment-2075204752 or https://github.com/e-mission/em-public-dashboard/pull/125/files#diff-bde90ebb933051b12f18fdcfcefe9ed31e2e3950d416ac84aec628f1f9cc2780R136

This is bad. We use docker-compose extensively in the READMEs, and we should be standardizing on it. Using docker build or docker run makes it more likely that we will make mistakes in the way that the containers are configured and interact with each other.

I have already commented on this before: https://github.com/e-mission/em-public-dashboard/pull/125#issuecomment-2081561409

I will not review or approve any further changes that use docker build or docker run unless there is a reason that docker-compose will not work

nataliejschultz commented 1 month ago

I got docker compose to work in actions for our process, but had to do it in a roundabout way. The issue is that we want to push an image with the correct tag, and docker build allows you to specify the name of the tag using the -t flag. Docker compose does not work this way; you have to name the image in the compose file directly like this:

services:
  dashboard:
    build:
    image: name_of_image

Originally, I had planned to use an environment variable in my compose call

SERVER_IMAGE_TAG=$DOCKER_IMAGE_TAG_2 ADMIN_DASH_IMAGE_TAG=$DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }} docker compose -f docker-compose-dev.yml build

and then set the name of the image to ${ADMIN_DASH_IMAGE_TAG}. However, this does not seem ideal for people running locally. I found a way around this by adding a renaming step in the build process:

- name: rename docker image
      run: |
        docker image tag e-mission/opdash:0.0.1 $DOCKER_USER/${GITHUB_REPOSITORY#*/}:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}

This way we can keep the names of the images the same and push them correctly. I tested the environment variable version here, and the renaming version here. Both worked!