e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

♻️ 💅 Cleanup/polish the code to automatically cascade images from the server -> server dependencies #1082

Open shankari opened 1 month ago

shankari commented 1 month ago

Copied over pending tasks from: https://github.com/e-mission/e-mission-server/pull/961#issuecomment-2272509467

MukuFlash03 commented 3 days ago

I have segregated the tasks into two individual groups (CI/CD, Core code) + one combo group(CI/CD + Core)

A. CI/CD


B. Core code

C. Core Code + CI/CD

MukuFlash03 commented 3 days ago

The order I plan to work on these: A. CI/CD: 7, 6, 2, 8, 4, 1, 3, 5 B. Core: 2, 1 C. Core + CI/CD: 1, 2

MukuFlash03 commented 3 days ago

Task A-7: Certificates location - External or Internal?

Fixed as a part of the redesign changes itself.

Added certificates externally only in server repo (commit) Only need them in images that need to connect to the AWS Document DB (comment)

Since the same base server image is used by the server image dependent containers (webapp, analysis, admin-dash, public-dash-notebook), we have added it right at the source image which is the external server image. This way it is ensured that the certificates are present in all the cascading images.


Based on comment below, added them to internal and removed from external.

shankari commented 3 days ago

@MukuFlash03 right we have implemented this. But I am suggesting that we revisit that decision

Discuss where to put in the cert configuration. We originally had it in the internal repos, then we moved it to the external repos. But now that we have one internal dockerfile per external dockerfile, maybe we can have it be internal after all. It doesn't actually hurt anything to be external, but it is unnecessary in the external repo.

MukuFlash03 commented 3 days ago

Task A-6: Switching from $GITHUB_ENV to step outputs using GITHUB_OUTPUT

Shankari's comments

Consider switching from $GITHUB_ENV to step outputs. I think that the step outputs are the recommended approach to pass values from one step to another, but we should verify

I couldn’t find any official statement on which is the recommended approach. But I did see this warning that says “set-output” is deprecated, switch to using environment files

Screenshot 2024-09-20 at 11 49 22 AM

Note that this doesn’t say that “steps.output” is deprecated; it says “set-output” is deprecated. So, step outputs are still valid and we essentially have to choose between: GITHUB_ENV and GITHUB_OUTPUT


One argument in favor of GITHUB_OUTPUT is this:

MukuFlash03 commented 3 days ago

Task A-2: Avoid uploading date as Artifacts -> Use .env file instead?

Shankari's comments

Explore whether we can stop uploading the date of of the run as an artifact since we are using an .env file in the dashboards. Can't this just be an output of the run?


REST API endpoints

I looked at some REST API endpoints to see if we can access outputs of jobs/steps in a run outside of the workflow run. We can then use them directly in the internal script to pull all tags.

Not Suitable

  1. Get a workflow run I did not find an API endpoint to directly reference outputs inside in a job. This endpoint gets details about a workflow run but does not have info on outputs.

  2. List jobs for a workflow run This lists all jobs and steps but it only lists their completion status, time of execution and not outputs or any details.

  3. Download workflow run logs I did however find an API endpoint that allows downloading workflow run logs. This does give the outputs but it also gives everything from the logs as seen in the UI. This is a lot of redundant information that we will need to parse to only fetch the outputs. It downloads logs a zip file which would again be another hassle to: download -> extract -> read / parse -> clean up files.


Optimal Approach

  1. Get repository content This endpoint allows to directly read contents of a file, in our case .env files containing tags. For instance, this API response is from my forked repo where I've stored the tags in a .env file. The API response has the file content in base64 encoded format:
...
 "download_url": "https://raw.githubusercontent.com/MukuFlash03/em-public-dashboard/cleanup-cicd/.env",
  "type": "file",
  "content": "Tk9URUJPT0tfSU1BR0VfVEFHPTIwMjQtMDktMjAtLTEzLTM2CkZST05URU5E\nX0lNQUdFX1RBRz0yMDI0LTA5LTIwLS0wNC0zOApTRVJWRVJfSU1BR0VfVEFH\nPTIwMjQtMDktMjAtLTA5LTEwCg==\n",
...

Decoding this base64 data (for decoding, the newline characters '\n\ need to be removed and combine the entire encoded string)

NOTEBOOK_IMAGE_TAG=2024-09-20--13-36
FRONTEND_IMAGE_TAG=2024-09-20--04-38
SERVER_IMAGE_TAG=2024-09-20--09-10
MukuFlash03 commented 3 days ago

Task A-2: Avoid uploading date as Artifacts -> Use .env file instead? [Contd.]

With the API endpoint mentioned above, we can directly read contents of files.

Proposed Approach

To have a separate .env file in each of the repositories: server, join-page, admin-dash, public-dash. This file would contain the image tags of the latest uploaded images. Server image tag is needed in the dashboard repos' .env files since the Dockerfiles use the server tag as an ARG

In server repo: SERVER_IMAGE_TAG In join-page repo: JOIN_IMAGE_TAG In admin-dash repo: ADMIN_DASH_IMAGE_TAG, SERVER_IMAGE_TAG In public-dash: PUBLIC_DASH_NOTEBOOK_IMAGE_TAG, PUBLIC_DASH_FRONTEND_IMAGE_TAG, SERVER_IMAGE_TAG


Pros of this approach:

shankari commented 2 days ago

why use something complicated which retrieves the file as a base64 encoded string instead of just reading the files directly using the raw option?

MukuFlash03 commented 2 days ago

why use something complicated which retrieves the file as a base64 encoded string instead of just reading the files directly using the raw option?

I just thought of sticking to using Github REST API and was looking for options within that.

But reading the raw contents as text is a much simpler option indeed. It also does not require any headers, authorization token for the request which is valid since it is publicly available data anyways. Will switch out the URLs and remove the base64 code. Thank you for pointing that out!

MukuFlash03 commented 2 hours ago

Task A-8: Stored tag format changed to branch_name-timestamp

Currently the docker images are tagged as _ but the artifact tags only had the timestamp. The branch names are hardcoded in the internal scripts and are prefixed to the retrieved timestamps.

However, this causes a problem if the docker image upload is done from a different branch in the external GitHub workflow, since the complete returned tag in the internal script would still have the default hardcoded branch_name but latest timestamp.

Hence now storing the same tag that the docker image is tagged with including both the branch_name and timestamp.

MukuFlash03 commented 1 hour ago

First set of cleanup PRs added for Tasks A-2, 6, 7, 8.

PRs: e-mission-server, join-page, admin-dash, public-dash