reports.calitp.org exists to provide a snapshot of transit service information and transit data quality information for agencies across California. It also serves to connect agencies and other stakeholders with technical assistance from Cal-ITP and Caltrans.
We manually update the reports site on a monthly cycle, usually in the first or second week of a calendar month. This generates content for the previous month, for example updating the site in the first week of December will add content for November.
After the site is updated, we manually send emails using a list provided by the Cal-ITP Customer Success team.
This is the easiest option to generate the reports site. Simply open the Build and deploy to Netlify Action, and use the "Run workflow" drop-down to generate the development or production Reports site.
You must generate the development site before the production site. The development site can be reviewed at https://test--cal-itp-reports.netlify.app (url may change, can confirm by viewing the Deploy development to Netlify
section in an Action run). Check that the development site looks good before running a production build. The development Action also stages necessary data to build the production site with the latest information.
This step is not currently included in the GitHub Action. After obtaining an updated email list in the form of a csv file, it can be ran via JupyterHub or another platform.
Navigate to the subdirectory, i.e. (cd reports
), and:
postmark_server_token
.year
(if needed), month
and month_name
.test_emails.csv
is a test csv that mimics the production data without any actual client emails.test_emails.csv
specified in both development and production.poetry run python generate_report_emails.py development
(you may need to run poetry install
first).npx mjml
.email_csv_path
to your updated email list. Rerun the production script. It will prompt you to confirm the list of recipents. Once those emails are sent, the process is complete!It is possible to generate the reports site outside the GitHub Action. Follow the instructions below, but note that it's not currently possible to generate the site locally on a Caltrans computer. It is possible to generate the reports and, importantly, send the reports emails on JupyterHub.
If running on JupyterHub and you've completed the usual analyst onboarding, no need to repeat this step. Otherwise: Set up google cloud authentication credentials.
Specifically, download the SDK/CLI at the above link, install it, create a new terminal/source a .zshrc and be sure to run both
gcloud init
gcloud auth application-default login
Note that with a user account authentication, the environment variable CALITP_SERVICE_KEY_PATH
should be unset.
The Makefile located in the reports/
subdirectory includes the necessary commands to generate the reports. poetry handles the required Python dependencies and environment.
Navigate to the subdirectory, i.e. (cd reports
), and run:
poetry install
poetry run make parameters
poetry run make data
If a clean start is necessary, first run:
poetry run make clean
Once the report data has been generated navigate to the website subfolder (i.e. cd ../website
), install the npm dependencies if you haven't done so already, and build the website.
poetry run npm install
poetry run npm run build
These commands perform the following:
website/generate.py
loads JSON from the reports/outputs/YYYY/MM/ITPID/data
directory and applies it to template files in /templates
It is worth mentioning that npm run build
will currently only execute if you have data from previous months. Run npm run dev
for verbose output and to see which month is failing, which can help with troubleshooting.
Note that the error:
jinja2.exceptions.UndefinedError: 'feed_info' is undefined
Is often due to a lack of generated reports. This can be remedied for prior months by rsyncing the reports from the upstream source (see Fetching report data), and ensuring every single ITPID has a corresponding generated report for the current month (see Generating reports).
To check that everything is rendered appropriately, go into the website/build (i.e. cd build
) directory:
python -m http.server
and open up a web browser, and navigate to: localhost:8000
Unfortunately it's not possible to do this if running on JupyterHub. You can view the finished site changes by including them on the development site, for example by first merging a PR and viewing the development site generated by the GitHub action.
This repository is set up in two pieces:
reports/
subfolder - generates underlying GTFS data for each report.website/
subfolder - uses generate.py
and ../templates/
to create the static reports website.Execute make parameters
to generate the following artifacts.
outputs/index_report.json
- a file that lists every agency name and outputs/YYYY/MM
folderoutputs/rt_feed_ids.json
- labels agencies by whether they have an RT feedoutputs/speedmap_urls.json
- labels agencies with their RT speedmap URL, if one existsoutputs/YYYY/MM/AGENCY_ITP_ID/
folder for each agencymain
branchThis is rarely required since by default, the commands above will quickly generate reports data for all months. However, it remains possible to run gsutil rsync to update all the locally stored reports.
Note that test-calitp-reports-data
can be replaced with calitp-reports-data
for testing on production data:
gsutil -m rsync -r gs://test-calitp-reports-data/report_gtfs_schedule outputs
Also unnecesssary if using the defualt commands, it is possible to selectively run a single month's reports. Execute poetry run python generate_reports_data.py --year=2023 --month=02
to populate those output folders for a given month with the following files (per folder).
These files are used to generate the static HTML (see below).
NOTE that the --month
refers to the month of the folders that will be generated, NOT the month in which the reports are published (typically on the first day of the month). For example, poetry run python generate_reports_data.py --year=2023 --month=02
will populate outputs/2023/02/*
folders, whereas the publish_date
for the data in those folders is 2023-03-01
.
After generation, the reports can be validated by running poetry run python validate_reports.py
. This examines all of the output folders and ensures that the generated files are present and that they follow the same schema.
Additionally, you can see if there are any missing files by running:
find ./outputs/2023 -mindepth 3 -maxdepth 3 -type f '!' -exec test -e "1_feed_info.json" ';' -print
If there is a missing month, an individual month can be run with the following command:
python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json
Tests can be run locally from the tests
directory by running python test_report_data.py
. These tests are run on commits through a github action.
Since this is part of the development GitHub Action, it's not necessary to run manually. Info below for reference.
The next step is to update the development bucket in google cloud with the new data.
In the case where data must be overwritten (please use caution!) a -d
flag can be added to the command
to "mirror" the buckets, i.e. delete destination data that isn't being copied
from the source.
gsutil -m rsync -r [-d] outputs/ gs://test-calitp-reports-data/report_gtfs_schedule/
Assuming that all the data is correct in development, you can sync the test data to production.
gsutil -m rsync -r gs://test-calitp-reports-data/report_gtfs_schedule/ gs://calitp-reports-data/report_gtfs_schedule/
Note that the folder also contains a docker-compose.yml
, so it is possible to run the build inside docker by running these commands first.
In this case, docker first needs to be installed locally, setting resources as desired (i.e. enable 6 cores if you have an 8 core machine, etc).
Open a terminal and navigate to the root folder of a locally cloned repo and enter:
docker-compose run --rm --service-ports calitp_reports /bin/bash
If google credentials are already configured on the host, the local credential files should already be mounted in the container, but it may only be necessary to run gcloud auth application-default login
from within the container.