Closed MukuFlash03 closed 1 month ago
Checking in the first initial set of changes for consolidating the differences in the external and internal versions. These are based on the differences identified here
Current changes involve:
These changes are still pending: viz_scripts/docker/start_noteboook.sh: Python notebooks run directly internally instead of cron tab viz_scripts/Dockerfile: AWS certificates.
I wanted to comment on the addition of cert.sh
to the repo. This is a file that is better to be kept internally only, and will be added to the internal container build by using the prod/jenkins variable. However, we can keep it external for testing.
For example, we can add our prod/jenkins variable to our GitHub action, and see what it looks like when the different portions of code are run according to our script. This way we can perfect it before modifying the internal repos.
Docker Volume usage
When testing initially, I was unable to see generated plots in the frontend dashboard page. This is when I learnt about how volumes defined in docker-compose actually work.
So, I was testing with the internal repos,
Created folders in internal nrelopenpath multi-tier.
Added -v flags to both frontend and viz_scripts containers docker run commands.
This creates two-way connection - any changes in either of the directories before and after colon in -v flag will be updated when changes occur in the other.
Great! It’s working. Plots visible.
Plots do have the label - “Unable to generate plots”, though which come up as empty tables with 0 (zero) values.
Run “docker run” command inside nrelopenpath/public_dash/ directory, else plots/ created outside public_dash in root nrelopenpath folder.
@iantei 's documentation was really helpful:
I have been testing the external repo with the image-push branch, which contains the redesign code changes. The Dockerfile uses the base server image from my personal Dockerhub repository.
Earlier, while I was spinning up the DB container as well, I wasn't actually loading any data in it. Without data, I was seeing empty data tables with 0 (zero) or None values. Additionally, a text in red was present in the plot - "Unable to generate plot ; reason:"
Steps followed to test code changes:
A. Built images and ran containers using docker-compose:
Dev
$ docker compose -f docker-compose.dev.yml build
$ docker compose -f docker-compose.dev.yml up -d
Prod
$ docker compose -f docker-compose.yml build
$ docker compose -f docker-compose.yml up -d
B. This time, I loaded stage data using available stage snapshot dataset once the containers were up:
$ bash viz_scripts/docker/load_mongodump.sh <tar_dataset_file>
Then, also need to change DB_HOST environment variable in docker-compose files to MongoDB URL:
DB_HOST=mongodb://db/openpath_stage
C. Executing Jupyter notebooks
Before executing the notebook, I had to change the year and month in the notebooks to None instead of current hardcoded values year = 2020 and month = 11.
Then executed Jupyter notebooks following steps in this public-dashboard execution documentation.
I was successfully able to load data and see plots with actual data like staging public-dash URL
Screenshots showing generated plots
Public-dash External
Public-dash Staging [for reference]
Part 1: Error stack trace
However, when I tried this with the dev compose file, got this error in edb get_database.py
file where it says that config["url"] raises a KeyError.
This is called when scaffolding
is imported in the python notebooks, which in turn imports edb
, which ultimately triggers the call to get_config()
in and get_config_data()
in emission.core.config.py
Part 2: Tried to debug
However, when I connect to notebook-server docker container and run python and then manually import edb and run any MongoDB commands, it works.
I added print statements to log the URL and result_limit in edb->get_database.py
(emission) root@056e2b1de88:/us/src/app# cat emission/core/get_database.py | head -20
import emission.core.config as ecc
url = ecc.get_config()["url"]
result_limit = ecc.get_config()["result_limit"]
print ("URL: %s" % url)
print("Result limit: %s" % result_limit)
I ran these python commands inside the docker container
$ python
>>> import emission.core.get_database as edb
>>> edb.get_timeseries_db().count_documents({})
O/P
# Printed due to module import
URL: mongodb://db/openpath_stage
Result limit: 250000
Connecting to database URL mongodb://db/openpath_stage
# Output result of count_documents()
12355103
So unable to understand why on running the notebooks manually leads to this error, that too specifically with docker-compose.dev file; production docker-compose file works correctly.
I have spent many hours trying to test em-public-dashboard (from current main branch as a positive control) with the openpath_prod_uue_snapshot_oct3.tar.gz.
I followed the procedure that was outlined by the documentation, as well as by @MukuFlash03 above, and still only got the empty result:
Mukul tested the procedure using this same dataset, and got the exact same result. His successes (above) were from using openpath-stage-snapshot-dec-20.tar.gz. I am going to seek out another dataset to test with.
Testing of the public dash has been really difficult. I have tried so, so many times, and have finally gotten the current published version of public dash to generate images.
On my local copy of the public dash published code, I did the following:
Set all of the year and month values in each relevant ipynb to None (despite the documentation saying not to do this?)
Changing DB_HOST=mongodb://db/openpath_stage in the docker compose + dev Using STUDY_CONFIG=stage-program (I am using data that corresponds to this)
Running:
$ docker compose -f docker-compose.dev.yml build
$ docker compose -f docker-compose.dev.yml up -d
$ bash viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-***-**.tar.gz
$docker exec -it em-public-dashboard-notebook-server-1 /bin/bash
Inside viz-scripts container:
source setup/activate.sh
cd saved-notebooks/
PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default
Opening the localhost at port 3274:
On my local copy of Mukul’s image-push branch, I wanted to now try some program-specific data. Here is what I did:
Set all of the year and month values in each relevant ipynb to None
Create db container: $ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
Frontend:
$ docker build -t int-public-dash-frontend ./frontend/
$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission int-public-dash-frontend
Viz-scripts: Changed to build from my local image of our changed server code:
Ran:
$ docker build -t viz-scripts ./viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=[mongodb://db/](mongodb://db/openpath_stage)openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=cortezebikes -p 47962:8888 -v ./plots:/plots --network emission viz-scripts
$ docker exec -it int-public-dash-notebook-1 /bin/bash
Inside viz-scripts container:
source setup/activate.sh
cd saved-notebooks
PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default
This loaded some data!
Of course, I also wanted to try to recreate the stage data result from before.
I followed the exact same steps as above, using a different STUDY_CONFIG and DB_HOST:
$ docker build -t viz-scripts ./viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=mongodb://db/openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=stage-program -p 47962:8888 -v ./plots:/plots --network emission viz-scripts
It did not work. I am unsure as to why, and am at a bit of a loss for what to do from here.
On my local copy of Mukul’s image-push branch, I wanted to now try some program-specific data. Here is what I did: ...
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=[mongodb://db/](mongodb://db/openpath_stage)openpath_stage -e WEB_SERVER_HOST=0.0.0.0 -e STUDY_CONFIG=cortezebikes -p 47962:8888 -v ./plots:/plots --network emission viz-scripts
Why are you not using docker-compose
as before? If you change both the image and the process to run, any error that you encounter might be the result of either change.
This loaded some data!
Or it may just be showing results from the previous successful run. Did you clear out the plots
directory between runs?
It did not work. I am unsure as to why, and am at a bit of a loss for what to do from here.
What did you try to debug? Did you try to look at logs? Did you try to add print statements? Did you try to look at the documentation for the public dashboard?
However, when I tried this with the dev compose file, got this error in edb get_database.py file where it says that config["url"] raises a KeyError.
This is almost certainly because the directory that you are running from does not have a conf/storage/db.conf
file. But I thought we were not going to use the conf/storage/db.conf
any more and were going to use environment variables instead. Has that change been implemented?
Why are you not using
docker-compose
as before? If you change both the image and the process to run, any error that you encounter might be the result of either change.
In reference to this comment, I wanted to make sure that I was building everything off of locally generated images. I will try using this locally built image with the docker compose instead of docker run. However, in the internal repos, we have to test with docker run since the Jenkins pipeline cannot use compose.
This loaded some data!
Or it may just be showing results from the previous successful run. Did you clear out the
plots
directory between runs?
No, I thought these plots would be overwritten. Comparing the "Number of trips All data Default" graph for the stage data (left), the cortezebikes snapshot that I loaded into my locally built container (middle), and the current public cortezebikes public dashboard (right), you can see that my locally generated image in the middle is consistent with the cortezebikes trip data up to the recent snapshot date.
What did you try to debug? Did you try to look at logs? Did you try to add print statements? Did you try to look at the documentation for the public dashboard?
In debugging, I have tried:
docker exec -it
to change various variables to read the data in properly, ie: export STUDY_CONFIG=stage-program
, export DB_HOST=DB_HOST=mongodb://db/openpath_stage
and using echo $STUDY_CONFIG
to make sure exporting the variables took before re-running the notebooksapt-get install vim
) and directly modifying the ipynb files via vi + insert+ wq, using cat command to make sure changes took, and re-running the notebook run commands listed abovedocker exec -it
inside the mongo container, and running mongo
then
show dbs
to make sure the data has loaded and to check that the db name matches with my DB_HOST environment variable However, when I tried this with the dev compose file, got this error in edb get_database.py file where it says that config["url"] raises a KeyError.
This is almost certainly because the directory that you are running from does not have a
conf/storage/db.conf
file. But I thought we were not going to use theconf/storage/db.conf
any more and were going to use environment variables instead. Has that change been implemented?
Yes, the changes have been implemented in e-mission-server so that environment variables are used instead of the config files. This issue is not being observed in admin-dash, or public-dash when loading using production docker compose file.
The issue was just seen with the dev version of the docker-compose.dev file and that too on running the plots for the python notebooks manually.
When I directly connected to the container via the terminal and executed commands which invoked edb, the "url" value was printed correctly.
I've mentioned my debugging process in this comment above.
@MukuFlash03 I did see your debugging process. Other that investigating the code myself, my only hint is that this is almost certainly because the file is missing. Your question should then be: why is the dev version using the file instead of the environment variable?
Again, the only way for me to provide an answer to that would be to run it myself. There is not enough information in your debugging steps to provide any other expert insights.
Re-testing my local version of public dashboard using docker compose this time. Pre-run checks:
Dockerfile is set to build my local server image with changes: consolidate-diff-server-localbuild:
Docker-compose.yml has DB_HOST=mongodb://db/openpath_stage and STUDY_CONFIG=stage-program:
All ipynb notebooks have year and month set to none:
Plots folder emptied
Other containers utilizing the same ports removed
Commands run:
$ docker compose -f docker-compose.yml build
$ docker compose -f docker-compose.yml up -d
$ docker ps
$ docker rename mukul-forked-em-public-dashboard-db-1 em-public-dashboard-db-1
$ bash viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-dec-20.tar.gz
$ docker exec -it mukul-forked-em-public-dashboard-notebook-server-1 /bin/bash
root@6a1431d35053:/usr/src/app# source setup/activate.sh
(emission) root@6a1431d35053:/usr/src/app# cd saved-notebooks/
(emission) root@6a1431d35053:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
(emission) root@6a1431d35053:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default
Docker compose with the locally built server image from the consolidate-differences branch worked!
Reflecting on my previous attempts , this is what's different between my previous test using docker build
:
Since I will have to use docker build to test the internal build, I will likely find out if (1) was the issue all along.
Your question should then be: why is the dev version using the file instead of the environment variable?
Found the reason for the bug and fixed it in this commit in the server repo. https://github.com/MukuFlash03/e-mission-server/commit/e778b3f3cf050eec33ef439a1b6763b6aaf533e0
The bug was triggered by a .gitignore-d conf/ directory
in public-dash which had a db.conf file
.
This was being loaded when docker-compose.dev.yml
was used to test the dev version of public-dash.
This was occurring since there was a nested dictionary in the db.conf.sample and db.conf files while I had initially only stored the nested keys (url, result_limit). Since it was still reading from the file, it stored the nested dictionary format with timeseries as the parent key followed by (url, result_limit) as children.
Hence, the correct way to access the db.conf values would be ["timeseries"]["url"]
instead of just ["url"]
.
Fixed it by accessing these values correctly.
Testing the Internal repo. Pre-run checks:
These images should have the values in ipynb notebooks set to none, since those changes are contained in the images.
Frontend:
$ docker build -t int-public-dash-frontend ./public-dashboard/frontend/
$ docker run --name int-public-dash-frontend-1 -d -p 3274:6060 -v ./plots:/public/plots --network emission em-pub-dash-prod/frontend
Checked to make sure frontend built properly:
Viz-scripts:
$ docker build -t int-public-dash-notebook ./public-dashboard/viz_scripts/
$ docker run --name int-public-dash-notebook-1 -d -e DB_HOST=mongodb://db/openpath-stage -e WEB_SERVER_HOST=0.0.0.0 -e CRON_MODE=TRUE -e STUDY_CONFIG="stage-program" -p 47962:8888 -v ./plots:/plots --network emission em-pub-dash-prod/viz-scripts
$ docker exec -it int-public-dash-notebook-1 /bin/bash
root@3d727ce04e7d:/usr/src/app# source setup/activate.sh
(emission) root@3d727ce04e7d:/usr/src/app# cd saved-notebooks/
(emission) root@3d727ce04e7d:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
(emission) root@3d727ce04e7d:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default
It did not load the images, just the default “unable to generate plot” images. I’ve spent a long time trying to figure out what could have gone wrong.
The root of the issue appears to be that the ipynb can’t extract data from the db. Noted differences between the internal and external testing processes:
CURRENT DATABASE URL: mongodb://db/openpath-stage
CURRENT_DB: Database(MongoClient(host=['db:27017'], document_class=dict, tz_aware=False, connect=True, uuidrepresentation=3), 'openpath-stage')
Which has not been very helpful yet. I also tried exporting the DB_HOST variable differently and STUDY_CONFIG, and have not had luck yet. I've followed the process of getting the db name using get_database.py, and how it interacts with the scaffolding.py script to try to understand what could be happening along the way.
I want to discuss with Mukul about how his process went, and if he has successfully gotten the images to generate. If that isn't an option I will simply keep pushing and trying different inputs/prints/container modifications
Potential causes of this issue: path, parsing of name, cron schedule being removed for AWS
Today I’m going to try:
Testing the Internal repo again, and used the containers for frontend and notebook that I generated and modified above.
I decided to delete the db container and re-load the data:
$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
$ bash public-dashboard/viz_scripts/docker/load_mongodump.sh ../../openpath_prod_cortezebikes_feb_6.tar.gz
And double checked that the environment variables in the container were set properly:
I cleared out the plots folder, and re-ran the notebooks.
I'm elated that it worked! I’m going to try the same for the stage data and see what happens.
Testing the Internal repo one more time, the same way as here.
Again, I deleted the db container and re-load the data:
$ docker run --name db -d -p 27017:27017 --network emission mongo:4.4.0
$ bash public-dashboard/viz_scripts/docker/load_mongodump.sh ../../openpath-stage-snapshot-dec-20.tar.gz
And reset the environment variables (but note, am using the same container):
I cleared out the plots folder, and re-ran the notebooks.
I had to empty the safari cache to load the right images, but it finally worked!!!!
It seems like the db container has an issue with finding the correct data when there are two data dumps present. Re-creating the container every time fixed it.
@nataliejschultz please also resolve the merge conflicts. Note also that this is still set to draft.
@nataliejschultz please also resolve the merge conflicts. Note also that this is still set to draft.
Resolved conflicts for merge
I removed cert.sh and modified the viz_scripts dockerfile to copy it every time, rather than based on the environment variable. All other tags/directories/etc are prepared for merge with main
Squash-merging again (similar to https://github.com/e-mission/e-mission-server/pull/961#issuecomment-2284918011) since otherwise we have 73 commits for 8 files.
@MukuFlash03 This push generated a build which failed: https://github.com/e-mission/em-public-dashboard/actions/runs/10393542152 Seems to be related to the recent changes around not building the frontend for workflow_dispatch. Did you test those changes?
This push generated a build which failed: https://github.com/e-mission/em-public-dashboard/actions/runs/10393542152
We are using the non-dev version of the docker compose file to build the image. This renames the image with the suffix :
dashboard:
image: em-pub-dash-prod/frontend
The fix should be to simply just change the name for both frontend and notebook images to add the prod
suffix.
@nataliejschultz had added the commit but it was using the incorrect file name.
So I changed the file name in this commit. But I ended up changing the image name as well in this commit, which is wrong.
@MukuFlash03 Note that, with these changes, the frontend and notebook images will have different tags. While you are changing this, you may want to change the final step to upload an artifact with two tags (or two artifacts, one for each image). @nataliejschultz will then have to change the internal script to handle both tags
@MukuFlash03 Note that, with these changes, the frontend and notebook images will have different tags. While you are changing this, you may want to change the final step to upload an artifact with two tags (or two artifacts, one for each image). @nataliejschultz will then have to change the internal script to handle both tags
We do not need changes for this since the tag
is the timestamp generated:
- name: push docker images
run: |
if [ "${{ github.event_name }}" == "push" ]; then
docker push $DOCKER_USER/${GITHUB_REPOSITORY#*/}_frontend:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}
fi
docker push $DOCKER_USER/${GITHUB_REPOSITORY#*/}_notebook:${GITHUB_REF##*/}_${{ steps.date.outputs.date }}
- name: Create a text file
run: |
echo ${{ steps.date.outputs.date }} > public_dash_tag_file.txt
echo "Created tag text file"
This is the same for all images in the workflow run. So internally, the image name without the timestamp differs for frontend and notebook images which is hardcoded anyways as it is static. Only the timestamp tag changes based on the workflow run.
This is the same for all images in the workflow run. So internally, the image name without the timestamp differs for frontend and notebook images which is hardcoded anyways as it is static. Only the timestamp tag changes based on the workflow run.
@MukuFlash03 For the workflow_dispatch
runs, it is not.
Consider the following:
2024-08-14
viz_script
is tagged with 2024-08-16
, frontend is still at 2024-08-14
I see. In that case, @nataliejschultz can handle this in the internal script by checking for the type of trigger event. I did inspect the response from the GitHub REST API (URL for my branch workflows)
The event type is available:
"run_number": 49,
"event": "push",
"status": "completed",
"conclusion": "success",
"workflow_id": 90180351,
....
....
....
"run_number": 31,
"event": "workflow_dispatch",
"status": "completed",
"conclusion": "success",
"workflow_id": 90180351,
``
@MukuFlash03 Just because something is possible, doesn't mean that it is the elegant approach.
If you see the script that @nataliejschultz I think it would be a lot easier to just download two archive files. Note that she has a common function to download the archive file which is invoked once for each tag. having to looking at the type for only this repo would make that script a lot more ugly.
I tried implementing the logic for handling two different tags for frontend and notebook but it's not correct. I'm trying to think it through but we don't maintain the separate tags for frontend / notebook in the workflow currently.
For push events, we can have two artifacts. They'll have the same timestamps as tags. For workflow, for the notebook image we'll use the latest timestamp for the 1st artifact.
But what should be passed as the 2nd artifact for frontend image? We'll need to retrieve the last stored tag for frontend from somewhere?
I tested the end-to-end flow with workflow dispatch from the server repo in these runs:
Well, two tags are generated in the public-dash workflow run, but the tags are wrong.
Notebook tag is correct since it is fetched from current timestamp in either case.
But frontend tag is storing the server image tag. It should store last tagged frontend image.
So, as mentioned in this comment, I'm thinking how to retrieve the last tag for frontend image. This will need to be stored / maintained somewhere as well.
I can think of tons of options - here are two off the top of my head
.env
file which is overridden on push but not on workflow_dispatchFrontend image is small ~ 50 MB [Source: My dockerhub account] But I've gone ahead and implemented it using a different .env file.
This workflow run was triggered by a PUSH
event.
The tags were updated correctly and two artifacts produced (FRONTEND, NOTEBOOK).
Next, I triggered a workflow run from e-mission-server via Workflow Dispatch
event.
This was able to read in the current FRONTEND image tag from the .env.repoTags
file and upload the existing value as an artifact.
NOTEBOOK tag on the other hand is updated normally as it should irrespective of event type.
This PR serves as the implementation for this issue for redesigning our build and deployment processes across e-mission-server, join, admin-dash, public-dash primarily.
Collaborating on this along with @nataliejschultz .
All four redesign PRs for the external repos: E-mission-server: https://github.com/e-mission/e-mission-server/pull/961 Join: https://github.com/e-mission/nrel-openpath-join-page/pull/29 Admin-dash: https://github.com/e-mission/op-admin-dashboard/pull/113 Public-dash: https://github.com/e-mission/em-public-dashboard/pull/125
A rough plan can be:
1st phase
2nd phase: