18F / analytics-reporter-api

The Analytics API maintains the schema for the database that the Analytics Reporter writes to.
Other
6 stars 3 forks source link
analytics api

Code Climate CircleCI

Analytics API

A system for publishing data retrieved from the Google Analytics API by the Analytics Reporter. This Analytics API serves data written to a PostgreSQL database by the Analytics Reporter, in response to HTTP requests.

This project's data is provided by the Analytics Reporter using the Google Analytics Data API v1. The analytics data is processed into a flat data structure by the reporter and stored in the database which is then served by this API.

The project previously used the Google Analytics Core Reporting API v3 and the Google Analytics Real Time API v3, also known as Universal Analytics, which has slightly different data points.

Analytics API v1 serves the Universal Analytics data and Analytics API v2 serves the new GA4 data. See Migrating from API V1 to API V2 for more details. The Universal Analytics API will be deprecated on July 1, 2024 and the Analytics API v1 will no longer receive new data after that date.

The process for adding features to this project is described in Development and deployment process.

Setup

This Analytics API maintains the schema for the database that the Analytics Reporter writes to. Thus, the Analytics API must be setup and configured before the Analytics Reporter starts writing data.

Prerequisites

Clone the code and install dependencies

git clone git@github.com:18F/analytics-reporter-api.git
cd analytics-reporter-api
npm install

Linting

This repo uses Eslint and Prettier for code static analysis and formatting. Run the linter with:

npm run lint

Automatically fix lint issues with:

npm run lint:fix

Install git hooks

There are some git hooks provided in the ./hooks directory to help with common development tasks. These will checkout current NPM packages on branch change events, and run the linter on pre-commit.

Install the provided hooks with the following command:

npm run install-git-hooks

Running the unit tests

The unit tests for this repo require a local PostgreSQL database.

Start the test DB in Docker:

docker compose -f docker-compose.test.yml up

The test DB connection in knexfile.js has some default connection config which works out-of-the-box with the docker-compose test db.

Run the tests (pre-test hook runs DB migrations):

npm test

Running the unit tests with code coverage reporting

If you wish to see a code coverage report after running the tests, use the following command. This runs the DB migrations, tests, and the NYC code coverage tool:

npm run coverage

Run the application

Start up the dev database and run the migrations.

# Start database in Docker
docker compose up -d

# Run migrations
npm run migrate

Now the app can be started.

npm start

The API should now be available at http://localhost:4444/. Note that the API will not render any data because the database is empty.

Load data with analytics-reporter

Data for the API is loaded into the database by the Analytics Reporter. For dev environments, the default database configuration for both the analytics-reporter repo and the analytics-reporter-api repo point to the same database.

Follow the instructions in the analytics-reporter README to set up the reporter and configure an agency to collect data for. Ignore any instructions about starting up a database - you'll use the database you already have running. Once setup is done, run the reporter with the --write-to-database option.

Now you should be able to retrieve data for the agency you selected. For instance, if you configured the reporter to load data for general-services-administration, then JSON data about browser demographics should be available at http://localhost:4444/v2.0.0/agencies/general-services-administration/reports/browsers/data.

Using the API

Full API docs can be found here: https://open.gsa.gov/api/dap/

Environments

The base URLs for the 3 API environments:

Overview

The Analytics API exposes 3 API endpoints:

Each endpoint renders a JSON array with the most recent 1000 records that the Analytics Reporter has generated for the given agency and report. If no records are found, an empty array is returned.

Records are sorted according to the associated date.

Limit query parameter

If a different number of records is desired, the limit query parameter can be set to specify the desired number of records.

/v2/reports/realtime/data?limit=500

The maximum number of records that can be rendered for any given request is 10,000.

Page query parameter

If the desired record does not appear for the current request, the page query parameter can be used to get the next series of data points. Since the data is ordered by date, this parameter effectively allows older data to be queried.

/v2/reports/realtime/data?page=2

Migrating from API V1 to API V2

Background

Analytics API V1 returns data from Google Analytics V3, also known as Universal Analytics (UA).

Google is retiring UA and is encouraging users to move to their new version Google Analytics V4 (GA4) in 2024.

Analytics API V2 returns data from GA4.

Migration details

Requests

The Analytics API endpoints are the same between V1 and V2, the only difference for API requests is the API version string.

Responses

Response data is slightly different in Analytics API V2. This change is due to the data provided by Google Analytics. Some data fields were retired in GA4, and some other useful data fields were added. The changes follow:

Deprecated fields
New fields
bounce_rate

The percentage of sessions that were not engaged. GA4 defines engaged as a session that lasts longer than 10 seconds or has multiple pageviews.

file_name

The page path of a downloaded file.

language_code

The ISO639 language setting of the user's device. e.g. 'en-us'

session_default_channel_group

An enum which describes the session. Possible values:

'Direct', 'Organic Search', 'Paid Social', 'Organic Social', 'Email', 'Affiliates', 'Referral', 'Paid Search', 'Video', and 'Display'

Creating a new database migration

If you need to migrate the database, you can create a new migration via knex, which will create the migration file for you based in part on the migration name you provide. From the root of this repo, run:

`npm bin`/knex migrate:make <the name of your migration>

See knex documentation for more details.

Running database migrations

Locally

npm run migrate

In production

In production, you can run database migrations via cf run-task. As with anything in production, be careful when doing this! First, try checking the current status of migrations using the migrate:status command

cf run-task analytics-reporter-api --command "knex migrate:status" --name check_migration_status

This will kick off a task - you can see the output by running:

cf logs analytics-reporter-api --recent
# the output will look something like...
2021-07-19T14:31:39.89-0400 [APP/TASK/check_migration_status/0] OUT Using environment: production
2021-07-19T14:31:40.16-0400 [APP/TASK/check_migration_status/0] OUT Found 3 Completed Migration file/files.
2021-07-19T14:31:40.16-0400 [APP/TASK/check_migration_status/0] OUT 20170308164751_create_analytics_data.js
2021-07-19T14:31:40.16-0400 [APP/TASK/check_migration_status/0] OUT 20170316115145_add_analytics_data_indexes.js
2021-07-19T14:31:40.16-0400 [APP/TASK/check_migration_status/0] OUT 20170522094056_rename_date_time_to_date.js
2021-07-19T14:31:40.16-0400 [APP/TASK/check_migration_status/0] OUT No Pending Migration files Found.
2021-07-19T14:31:40.17-0400 [APP/TASK/check_migration_status/0] OUT Exit status 0

To actually run the migration, you would run:

cf run-task analytics-reporter-api --command "knex migrate:latest" --name run_db_migrations

See knex documentation for more details and options on the migrate command.

Public domain

This project is in the worldwide public domain. As stated in CONTRIBUTING:

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.