Redesign Build Deployment Process (External)

MukuFlash03 commented 5 months ago

This PR serves as the implementation for this issue for redesigning our build and deployment processes across e-mission-server, join, admin-dash, public-dash primarily.

Collaborating on this along with @nataliejschultz .

All four redesign PRs for the external repos: E-mission-server: https://github.com/e-mission/e-mission-server/pull/961 Join: https://github.com/e-mission/nrel-openpath-join-page/pull/29 Admin-dash: https://github.com/e-mission/op-admin-dashboard/pull/113 Public-dash: https://github.com/e-mission/em-public-dashboard/pull/125

A rough plan can be:

1st phase

Consolidating Differences
- Minimize the differences in internal and external versions, cleanup unnecessary files, so on.
Image_Build_Push.yml:
- CI / CD via GitHub actions to build and push images to Dockerhub for all four repos based on the current process for e-mission-server.
- Image build push only builds for that specific repo when merge is made to a specific branch.
- Additionally, for e-mission-server, image-build-push will now also need to trigger the GitHub action in join, admin-dash, public-dash.

2nd phase:

Multi-tier uniform repo structure in nrelopenpath
- Need to change the internal repos to pull from the externally built images for each repo without duplicating the entire external code repository internally.

MukuFlash03 commented 5 months ago

Checking in the first initial set of changes for consolidating the differences in the external and internal versions. These are based on the differences identified here

Current changes involve:

Sed changed to jq Changed it in this script viz_scripts/docker/start_noteboook.sh to match the way it is being done internally. TODO: Might not even need to do this, once we decide how to handle environment variables, and might get rid of using the respective conf files at all.

These changes are still pending: viz_scripts/docker/start_noteboook.sh: Python notebooks run directly internally instead of cron tab viz_scripts/Dockerfile: AWS certificates.

nataliejschultz commented 5 months ago

I wanted to comment on the addition of cert.sh to the repo. This is a file that is better to be kept internally only, and will be added to the internal container build by using the prod/jenkins variable. However, we can keep it external for testing.

For example, we can add our prod/jenkins variable to our GitHub action, and see what it looks like when the different portions of code are run according to our script. This way we can perfect it before modifying the internal repos.

MukuFlash03 commented 4 months ago

Docker Volume usage

When testing initially, I was unable to see generated plots in the frontend dashboard page. This is when I learnt about how volumes defined in docker-compose actually work.

So, I was testing with the internal repos,

Created folders in internal nrelopenpath multi-tier.
- Created public_dash directory and moved frontend and viz_scripts inside it.
- Created .gitignored common plots directory in public-dash.
Added -v flags to both frontend and viz_scripts containers docker run commands.
This creates two-way connection - any changes in either of the directories before and after colon in -v flag will be updated when changes occur in the other.
- Thus, viz_scripts should update and store plots in its internal docker plots/
- Due to volume connection, this should update and also store plots in external local or host plots/
- Again, frontend is also connected to the external volume, so it should also have access to plots/ in its internal docker public/plots/

Great! It’s working. Plots visible.

Plots do have the label - “Unable to generate plots”, though which come up as empty tables with 0 (zero) values.
- Hopefully it’s there for internal repo main branch as well.
  - Yes, same message of missing plots. Perhaps unable to read from db as no data loaded. [Confirmed, this was the reason.]
Run “docker run” command inside nrelopenpath/public_dash/ directory, else plots/ created outside public_dash in root nrelopenpath folder.
- No issues as such, but let’s keep all public-dash stuff inside its own directory.

MukuFlash03 commented 4 months ago

Testing Public-dash the right way by loading data

@iantei 's documentation was really helpful:

I have been testing the external repo with the image-push branch, which contains the redesign code changes. The Dockerfile uses the base server image from my personal Dockerhub repository.

Earlier, while I was spinning up the DB container as well, I wasn't actually loading any data in it. Without data, I was seeing empty data tables with 0 (zero) or None values. Additionally, a text in red was present in the plot - "Unable to generate plot ; reason:"

Steps followed to test code changes:

A. Built images and ran containers using docker-compose:

Dev
$ docker compose -f docker-compose.dev.yml build 
$ docker compose -f docker-compose.dev.yml up -d

Prod
$ docker compose -f docker-compose.yml build 
$ docker compose -f docker-compose.yml up -d

B. This time, I loaded stage data using available stage snapshot dataset once the containers were up:

$ bash viz_scripts/docker/load_mongodump.sh <tar_dataset_file>

Then, also need to change DB_HOST environment variable in docker-compose files to MongoDB URL: DB_HOST=mongodb://db/openpath_stage

C. Executing Jupyter notebooks

Before executing the notebook, I had to change the year and month in the notebooks to None instead of current hardcoded values year = 2020 and month = 11.

Then executed Jupyter notebooks following steps in this public-dashboard execution documentation.

MukuFlash03 commented 4 months ago

I. Testing observations for production docker compose file

I was successfully able to load data and see plots with actual data like staging public-dash URL

Screenshots showing generated plots

Public-dash External

Public-dash Staging [for reference]

MukuFlash03 commented 4 months ago

II. Testing observations for development docker compose file

Part 1: Error stack trace

However, when I tried this with the dev compose file, got this error in edb get_database.py file where it says that config["url"] raises a KeyError. This is called when scaffolding is imported in the python notebooks, which in turn imports edb, which ultimately triggers the call to get_config() in and get_config_data() in emission.core.config.py 

Click to expand to see entire error stack trace

Error in this line present in edb get_database.py
---> 15 url = ecc.get_config()["url"]
KeyError: 'url'

BEGIN ERROR TRACEBACK ``` (emission) root@c056e2b1de88:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default /usr/src/app/saved-notebooks/bin/generate_plots.py:30: SyntaxWarning: "is not" with a literal. Did you mean "!="? if r.status_code is not 200: About to download config from https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/stage-program.nrel-op.json Successfully downloaded config with version 1 for Staging environment for testing programs only and data collection URL https://openpath-stage.nrel.gov/api/ label_options is unavailable for the dynamic_config in stage-program Running at 2024-04-24T02:55:07.542761+00:00 with args Namespace(plot_notebook='generic_metrics.ipynb', program='default', date=None) for range (, ) Running at 2024-04-24T02:55:07.589306+00:00 with params [Parameter('program', str, value='default'), Parameter('study_type', str, value='program'), Parameter('include_test_users', bool, value=True), Parameter('dynamic_labels', dict, value={}), Parameter('use_imperial', bool, value=False)] Traceback (most recent call last): File "/usr/src/app/saved-notebooks/bin/generate_plots.py", line 104, in compute_for_date(None, None) File "/usr/src/app/saved-notebooks/bin/generate_plots.py", line 101, in compute_for_date nbclient.execute(new_nb) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 1314, in execute return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute() File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/jupyter_core/utils/__init__.py", line 165, in wrapped return loop.run_until_complete(inner) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 709, in async_execute await self.async_execute_cell( File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 1062, in async_execute_cell await self._check_raise_for_error(cell, cell_index, exec_reply) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 918, in _check_raise_for_error raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content) nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell: ------------------ from collections import defaultdict import numpy as np import pandas as pd from plots import * import scaffolding sns.set_style("whitegrid") sns.set() %matplotlib inline ------------------ --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[2], line 7 4 import pandas as pd 6 from plots import * ----> 7 import scaffolding 9 sns.set_style("whitegrid") 10 sns.set() File /usr/src/app/saved-notebooks/scaffolding.py:16 10 # Module for pretty-printing outputs (e.g. head) to help users 11 # understand what is going on 12 # However, this means that this module can only be used in an ipython notebook 14 import IPython.display as disp ---> 16 import emission.core.get_database as edb 18 def no_traceback_handler(exception_type, exception, traceback): 19 print("%s: %s" % (exception_type.__name__, exception), file=sys.stderr) File /usr/src/app/emission/core/get_database.py:15 11 import json 13 import emission.core.config as ecc ---> 15 url = ecc.get_config()["url"] 16 result_limit = ecc.get_config()["result_limit"] 18 try: KeyError: 'url' ``` END ERROR TRACEBACK

MukuFlash03 commented 4 months ago

Part 2: Tried to debug

However, when I connect to notebook-server docker container and run python and then manually import edb and run any MongoDB commands, it works.

I added print statements to log the URL and result_limit in edb->get_database.py

(emission) root@056e2b1de88:/us/src/app# cat emission/core/get_database.py | head -20

import emission.core.config as ecc

url = ecc.get_config()["url"]
result_limit = ecc.get_config()["result_limit"]

print ("URL: %s" % url)
print("Result limit: %s" % result_limit)

I ran these python commands inside the docker container

$ python
>>>     import emission.core.get_database as edb
>>>     edb.get_timeseries_db().count_documents({})

O/P
# Printed due to module import
URL: mongodb://db/openpath_stage
Result limit: 250000
Connecting to database URL mongodb://db/openpath_stage

# Output result of count_documents()
12355103

So unable to understand why on running the notebooks manually leads to this error, that too specifically with docker-compose.dev file; production docker-compose file works correctly.

nataliejschultz commented 4 months ago

I have spent many hours trying to test em-public-dashboard (from current main branch as a positive control) with the openpath_prod_uue_snapshot_oct3.tar.gz.

I followed the procedure that was outlined by the documentation, as well as by @MukuFlash03 above, and still only got the empty result:

Mukul tested the procedure using this same dataset, and got the exact same result. His successes (above) were from using openpath-stage-snapshot-dec-20.tar.gz. I am going to seek out another dataset to test with.

nataliejschultz commented 4 months ago

Testing of the public dash has been really difficult. I have tried so, so many times, and have finally gotten the current published version of public dash to generate images.

On my local copy of the public dash published code, I did the following:

Set all of the year and month values in each relevant ipynb to None (despite the documentation saying not to do this?)

Changing DB_HOST=mongodb://db/openpath_stage in the docker compose + dev Using STUDY_CONFIG=stage-program (I am using data that corresponds to this)

Running:

$ docker compose -f docker-compose.dev.yml build
$ docker compose -f docker-compose.dev.yml up -d 
$ bash viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-***-**.tar.gz
$docker exec -it em-public-dashboard-notebook-server-1 /bin/bash

Inside viz-scripts container:

source setup/activate.sh
cd saved-notebooks/
PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default

Opening the localhost at port 3274:

On my local copy of Mukul’s image-push branch, I wanted to now try some program-specific data. Here is what I did:

Set all of the year and month values in each relevant ipynb to None

Create db container: $ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0

Frontend:

$ docker build -t int-public-dash-frontend ./frontend/

$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission int-public-dash-frontend

Viz-scripts: Changed to build from my local image of our changed server code:

Ran:

$ docker build -t viz-scripts ./viz_scripts/

$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=[mongodb://db/](mongodb://db/openpath_stage)openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=cortezebikes -p 47962:8888 -v ./plots:/plots --network emission viz-scripts
$ docker exec -it int-public-dash-notebook-1 /bin/bash

Inside viz-scripts container:

source setup/activate.sh
cd saved-notebooks
PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default

This loaded some data!

Of course, I also wanted to try to recreate the stage data result from before.

I followed the exact same steps as above, using a different STUDY_CONFIG and DB_HOST:

$ docker build -t viz-scripts ./viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=mongodb://db/openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program -p 47962:8888 -v ./plots:/plots --network emission viz-scripts

It did not work. I am unsure as to why, and am at a bit of a loss for what to do from here.

shankari commented 4 months ago

On my local copy of Mukul’s image-push branch, I wanted to now try some program-specific data. Here is what I did: ... $ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=[mongodb://db/](mongodb://db/openpath_stage)openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=cortezebikes -p 47962:8888 -v ./plots:/plots --network emission viz-scripts

Why are you not using docker-compose as before? If you change both the image and the process to run, any error that you encounter might be the result of either change.

This loaded some data!

Or it may just be showing results from the previous successful run. Did you clear out the plots directory between runs?

It did not work. I am unsure as to why, and am at a bit of a loss for what to do from here.

What did you try to debug? Did you try to look at logs? Did you try to add print statements? Did you try to look at the documentation for the public dashboard?

shankari commented 4 months ago

However, when I tried this with the dev compose file, got this error in edb get_database.py file where it says that config["url"] raises a KeyError.

This is almost certainly because the directory that you are running from does not have a conf/storage/db.conf file. But I thought we were not going to use the conf/storage/db.conf any more and were going to use environment variables instead. Has that change been implemented?

nataliejschultz commented 4 months ago

Why are you not using docker-compose as before? If you change both the image and the process to run, any error that you encounter might be the result of either change.

In reference to this comment, I wanted to make sure that I was building everything off of locally generated images. I will try using this locally built image with the docker compose instead of docker run. However, in the internal repos, we have to test with docker run since the Jenkins pipeline cannot use compose.

This loaded some data!

Or it may just be showing results from the previous successful run. Did you clear out the plots directory between runs?

No, I thought these plots would be overwritten. Comparing the "Number of trips All data Default" graph for the stage data (left), the cortezebikes snapshot that I loaded into my locally built container (middle), and the current public cortezebikes public dashboard (right), you can see that my locally generated image in the middle is consistent with the cortezebikes trip data up to the recent snapshot date.

What did you try to debug? Did you try to look at logs? Did you try to add print statements? Did you try to look at the documentation for the public dashboard?

In debugging, I have tried:

Referring to the public dashboard readme and developer documentation, as well as the process outlined by @iantei
Using docker exec -it to change various variables to read the data in properly, ie: export STUDY_CONFIG=stage-program, export DB_HOST=DB_HOST=mongodb://db/openpath_stage and using echo $STUDY_CONFIG to make sure exporting the variables took before re-running the notebooks
Downloading vim into the viz-scripts container while using docker exec (apt-get install vim) and directly modifying the ipynb files via vi + insert+ wq, using cat command to make sure changes took, and re-running the notebook run commands listed above
Viewing the mongo and notebook container logs in the docker desktop
Using docker exec -it inside the mongo container, and running mongo then show dbs to make sure the data has loaded and to check that the db name matches with my DB_HOST environment variable

MukuFlash03 commented 4 months ago

However, when I tried this with the dev compose file, got this error in edb get_database.py file where it says that config["url"] raises a KeyError.

This is almost certainly because the directory that you are running from does not have a conf/storage/db.conf file. But I thought we were not going to use the conf/storage/db.conf any more and were going to use environment variables instead. Has that change been implemented?

Yes, the changes have been implemented in e-mission-server so that environment variables are used instead of the config files. This issue is not being observed in admin-dash, or public-dash when loading using production docker compose file.

The issue was just seen with the dev version of the docker-compose.dev file and that too on running the plots for the python notebooks manually.

When I directly connected to the container via the terminal and executed commands which invoked edb, the "url" value was printed correctly.

I've mentioned my debugging process in this comment above.

shankari commented 4 months ago

@MukuFlash03 I did see your debugging process. Other that investigating the code myself, my only hint is that this is almost certainly because the file is missing. Your question should then be: why is the dev version using the file instead of the environment variable?

Again, the only way for me to provide an answer to that would be to run it myself. There is not enough information in your debugging steps to provide any other expert insights.

nataliejschultz commented 4 months ago

Re-testing my local version of public dashboard using docker compose this time. Pre-run checks:

Dockerfile is set to build my local server image with changes: consolidate-diff-server-localbuild:
Docker-compose.yml has DB_HOST=mongodb://db/openpath_stage and STUDY_CONFIG=stage-program:
All ipynb notebooks have year and month set to none:
Plots folder emptied
Other containers utilizing the same ports removed

Commands run:

$ docker compose -f docker-compose.yml build
$ docker compose -f docker-compose.yml up -d
$ docker ps
$ docker rename mukul-forked-em-public-dashboard-db-1 em-public-dashboard-db-1
$ bash viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-dec-20.tar.gz 
$ docker exec -it mukul-forked-em-public-dashboard-notebook-server-1 /bin/bash
root@6a1431d35053:/usr/src/app# source setup/activate.sh 
(emission) root@6a1431d35053:/usr/src/app# cd saved-notebooks/
(emission) root@6a1431d35053:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
(emission) root@6a1431d35053:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default

Docker compose with the locally built server image from the consolidate-differences branch worked!

Reflecting on my previous attempts , this is what's different between my previous test using docker build:

My build command didn't include CRON_MODE=TRUE. Would have to investigate more to see if this is the root of the problem.
I was not re-building the db container between every run. The container is automatically re-built in the compose, so I had to start fresh this time.
I was not clearing out the plots volume/folder.

Since I will have to use docker build to test the internal build, I will likely find out if (1) was the issue all along.

MukuFlash03 commented 4 months ago

Your question should then be: why is the dev version using the file instead of the environment variable?

Found the reason for the bug and fixed it in this commit in the server repo. https://github.com/MukuFlash03/e-mission-server/commit/e778b3f3cf050eec33ef439a1b6763b6aaf533e0

The bug was triggered by a .gitignore-d conf/ directory in public-dash which had a db.conf file. This was being loaded when docker-compose.dev.yml was used to test the dev version of public-dash.

This was occurring since there was a nested dictionary in the db.conf.sample and db.conf files while I had initially only stored the nested keys (url, result_limit). Since it was still reading from the file, it stored the nested dictionary format with timeseries as the parent key followed by (url, result_limit) as children.

Hence, the correct way to access the db.conf values would be ["timeseries"]["url"] instead of just ["url"]. Fixed it by accessing these values correctly.

nataliejschultz commented 4 months ago

Testing the Internal repo. Pre-run checks:

Dockerfile is set to build my locally built images from previous run in which I used docker compose
Mongo container running

These images should have the values in ipynb notebooks set to none, since those changes are contained in the images.

Frontend:

$ docker build -t int-public-dash-frontend ./public-dashboard/frontend/
$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission em-pub-dash-prod/frontend

Checked to make sure frontend built properly:

Viz-scripts:

$  docker build -t int-public-dash-notebook ./public-dashboard/viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=mongodb://db/openpath-stage -e WEB_SERVER_HOST=0.0.0.0 -e CRON_MODE=TRUE -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission em-pub-dash-prod/viz-scripts
$ docker exec -it int-public-dash-notebook-1 /bin/bash
root@3d727ce04e7d:/usr/src/app# source setup/activate.sh 
(emission) root@3d727ce04e7d:/usr/src/app# cd saved-notebooks/
(emission) root@3d727ce04e7d:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
(emission) root@3d727ce04e7d:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default

It did not load the images, just the default “unable to generate plot” images. I’ve spent a long time trying to figure out what could have gone wrong.

The root of the issue appears to be that the ipynb can’t extract data from the db. Noted differences between the internal and external testing processes:

Cron mode is not set to anything internally, and a large portion of start_script.sh is commented out and replaced with code to run AWS scheduled tasks
The db is not generated automatically, as it is with docker compose. It seems like there is a difference between how the db name is parsed, since the em-public-dash-db-1 is found automatically with the external build but the internal requires a db name of DB. To try to debug this, I’ve added prints to some of the scripts in the container with vim. Ex:
```
CURRENT DATABASE URL: mongodb://db/openpath-stage
CURRENT_DB:  Database(MongoClient(host=['db:27017'], document_class=dict, tz_aware=False, connect=True, uuidrepresentation=3), 'openpath-stage')
```
Which has not been very helpful yet. I also tried exporting the DB_HOST variable differently and STUDY_CONFIG, and have not had luck yet. I've followed the process of getting the db name using get_database.py, and how it interacts with the scaffolding.py script to try to understand what could be happening along the way.

I want to discuss with Mukul about how his process went, and if he has successfully gotten the images to generate. If that isn't an option I will simply keep pushing and trying different inputs/prints/container modifications

 Potential causes of this issue: path, parsing of name, cron schedule being removed for AWS

Today I’m going to try:

Using the other data set to see what happens, and if this doesn't elucidate the problem:
Re-building the server image with the changes from this commit . I have already used vim to incorporate these changes into my container, but I will start from scratch if I have to.

nataliejschultz commented 4 months ago

Testing the Internal repo again, and used the containers for frontend and notebook that I generated and modified above.

I decided to delete the db container and re-load the data:

$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
$ bash public-dashboard/viz_scripts/docker/load_mongodump.sh ../../openpath_prod_cortezebikes_feb_6.tar.gz

And double checked that the environment variables in the container were set properly:

I cleared out the plots folder, and re-ran the notebooks.

I'm elated that it worked! I’m going to try the same for the stage data and see what happens.

nataliejschultz commented 4 months ago

Testing the Internal repo one more time, the same way as here.

Again, I deleted the db container and re-load the data:

$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
$ bash public-dashboard/viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-dec-20.tar.gz

And reset the environment variables (but note, am using the same container):

I cleared out the plots folder, and re-ran the notebooks.

I had to empty the safari cache to load the right images, but it finally worked!!!!

It seems like the db container has an issue with finding the correct data when there are two data dumps present. Re-creating the container every time fixed it.

shankari commented 4 months ago

@nataliejschultz please also resolve the merge conflicts. Note also that this is still set to draft.

nataliejschultz commented 4 months ago

@nataliejschultz please also resolve the merge conflicts. Note also that this is still set to draft.

Resolved conflicts for merge

nataliejschultz commented 4 months ago

I removed cert.sh and modified the viz_scripts dockerfile to copy it every time, rather than based on the environment variable. All other tags/directories/etc are prepared for merge with main

shankari commented 1 month ago

Squash-merging again (similar to https://github.com/e-mission/e-mission-server/pull/961#issuecomment-2284918011) since otherwise we have 73 commits for 8 files.

shankari commented 1 month ago

@MukuFlash03 This push generated a build which failed: https://github.com/e-mission/em-public-dashboard/actions/runs/10393542152 Seems to be related to the recent changes around not building the frontend for workflow_dispatch. Did you test those changes?

MukuFlash03 commented 1 month ago

This push generated a build which failed: https://github.com/e-mission/em-public-dashboard/actions/runs/10393542152

We are using the non-dev version of the docker compose file to build the image. This renames the image with the suffix :

dashboard:
    image: em-pub-dash-prod/frontend

The fix should be to simply just change the name for both frontend and notebook images to add the prod suffix.

MukuFlash03 commented 1 month ago

@nataliejschultz had added the commit but it was using the incorrect file name.

So I changed the file name in this commit. But I ended up changing the image name as well in this commit, which is wrong.

shankari commented 1 month ago

@MukuFlash03 Note that, with these changes, the frontend and notebook images will have different tags. While you are changing this, you may want to change the final step to upload an artifact with two tags (or two artifacts, one for each image). @nataliejschultz will then have to change the internal script to handle both tags

MukuFlash03 commented 1 month ago

@MukuFlash03 Note that, with these changes, the frontend and notebook images will have different tags. While you are changing this, you may want to change the final step to upload an artifact with two tags (or two artifacts, one for each image). @nataliejschultz will then have to change the internal script to handle both tags

We do not need changes for this since the tag is the timestamp generated:

    - name: push docker images
      run: |
        if [ "${{ github.event_name }}" == "push" ]; then
          docker push $DOCKER_USER/${GITHUB_REPOSITORY#*/}_frontend:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}
        fi
        docker push $DOCKER_USER/${GITHUB_REPOSITORY#*/}_notebook:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}

    - name: Create a text file
      run: |
          echo ${{ steps.date.outputs.date }} > public_dash_tag_file.txt
          echo "Created tag text file"

This is the same for all images in the workflow run. So internally, the image name without the timestamp differs for frontend and notebook images which is hardcoded anyways as it is static. Only the timestamp tag changes based on the workflow run.

shankari commented 1 month ago

This is the same for all images in the workflow run. So internally, the image name without the timestamp differs for frontend and notebook images which is hardcoded anyways as it is static. Only the timestamp tag changes based on the workflow run.

@MukuFlash03 For the workflow_dispatch runs, it is not. Consider the following:

push: both images are tagged with 2024-08-14
new server build on 2024-08-16, viz_script is tagged with 2024-08-16, frontend is still at 2024-08-14

MukuFlash03 commented 1 month ago

I see. In that case, @nataliejschultz can handle this in the internal script by checking for the type of trigger event. I did inspect the response from the GitHub REST API (URL for my branch workflows)

The event type is available:


      "run_number": 49,
      "event": "push",
      "status": "completed",
      "conclusion": "success",
      "workflow_id": 90180351,
....
....
....
      "run_number": 31,
      "event": "workflow_dispatch",
      "status": "completed",
      "conclusion": "success",
      "workflow_id": 90180351,
``

shankari commented 1 month ago

@MukuFlash03 Just because something is possible, doesn't mean that it is the elegant approach.

If you see the script that @nataliejschultz I think it would be a lot easier to just download two archive files. Note that she has a common function to download the archive file which is invoked once for each tag. having to looking at the type for only this repo would make that script a lot more ugly.

MukuFlash03 commented 1 month ago

I tried implementing the logic for handling two different tags for frontend and notebook but it's not correct. I'm trying to think it through but we don't maintain the separate tags for frontend / notebook in the workflow currently.

For push events, we can have two artifacts. They'll have the same timestamps as tags. For workflow, for the notebook image we'll use the latest timestamp for the 1st artifact.

But what should be passed as the 2nd artifact for frontend image? We'll need to retrieve the last stored tag for frontend from somewhere?

MukuFlash03 commented 1 month ago

I tested the end-to-end flow with workflow dispatch from the server repo in these runs:

server, public-dash, admin-dash.

Well, two tags are generated in the public-dash workflow run, but the tags are wrong.

Notebook tag is correct since it is fetched from current timestamp in either case.

But frontend tag is storing the server image tag. It should store last tagged frontend image.

So, as mentioned in this comment, I'm thinking how to retrieve the last tag for frontend image. This will need to be stored / maintained somewhere as well.

shankari commented 1 month ago

I can think of tons of options - here are two off the top of my head

you can just save it in a different .env file which is overridden on push but not on workflow_dispatch
Or you can remove the optimization and rebuild both images every time, arguing that it makes the downstream flow easier and that the frontend image size is small (you have to check and actually make that argument)

MukuFlash03 commented 1 month ago

Frontend image is small ~ 50 MB [Source: My dockerhub account] But I've gone ahead and implemented it using a different .env file.

MukuFlash03 commented 1 month ago

Push event run

This workflow run was triggered by a PUSH event. The tags were updated correctly and two artifacts produced (FRONTEND, NOTEBOOK).

Workflow Dispatch run [server, public-dash, admin-dash]

Next, I triggered a workflow run from e-mission-server via Workflow Dispatch event.

This was able to read in the current FRONTEND image tag from the .env.repoTags file and upload the existing value as an artifact. NOTEBOOK tag on the other hand is updated normally as it should irrespective of event type.

e-mission / em-public-dashboard