cpicpgx / cpic-data

CPIC Data definitions and code to use that data
Other
27 stars 2 forks source link

CPIC Data

Build project

This repo contains all the code used to create and maintain the CPIC data model. This powers the API and the database.

If you're looking to use the REST API or get a copy of the database, go read the documentation.

Important Links for Everyone

Read the Docs

If you want more information about using the API or database, read the documenatation.

Bugs/Discussion

If you found a bug or need to discuss something, submit an issue. (Requires GitHub account)

Get Data/Code

If you want to get a copy of the raw data or code, check the releases.

Database Setup

You probably don't need to read the rest of this.

This section (and the next) are only applicable if you want to build the database from scratch. If you're importing a pre-built database export or using the API you don't need to do any of this. However, if you're interested in seeing an example of how to work with the database in Java code, follow along.

This project assumes you're running a Postgres 11+ database for loading/querying data.

Configuration happens with environment variables. Here's what needs to be set:

For local development you won't need to specify these. Set them if you're running in a different environment like the production or staging servers.

Running

Some steps below will require a compiled version (jar) of this project. Use gradle to build the jar file:

./gradlew jar

or if you're on windows

gradlew.bat jar

This will place a compiled "fat" jar (includes all dependencies) in the build/libs directory.

Bootstrapping the DB

If you have an export of the database you do not need to do this. The export has all structure and data already. This section is for creating a bootstrap, mostly-empty version of the database.

This project uses Flyway to set up the DB. Schema definition files are in the src/resources/db/migration directory. Run the following to build the db:

java -cp build/libs/CpicData.jar org.cpicpgx.db.FlywayMigrate

Bootstrapping Information

There are multiple entity-specific data files, each with their own importer class. The entry points to load gene-specific data are in the org.cpicpgx.importer package. Check the javadocs on the individual importer classes for command-line parameters.

To load all data at once, use the DataImport class. This takes a -d parameter that is a directory with the following sub-folders containing excel files:

Then put that jar on the classpath and run org.cpicpgx.DataImport class:

java -cp build/libs/CpicData.jar org.cpicpgx.DataImport -d <PATH_TO_DATA_DIRECTORY>

Exporting Data Artifacts

To export file artifacts of compiled data in the database use the DataArtifactArchive class. It expects a command line argument of a directory to write to. By default, it will write to a subdirectory with a datestamped name. Inside that folder will be subfolders for the different types of exported data.

java -cp build/libs/CpicData.jar org.cpicpgx.DataArtifactArchive -d <PATH_TO_EXISTING_DIRECTORY>

Running the API

This system relies on postgrest to run the API. The executable can be downloaded from the postgrest website or installed through a package manager. To run the API you can use the make target:

make api

This assumes two things:

  1. postgrest is in your $PATH
  2. you have all the required configurations set up as environment variables as outlined in the docs.

Maintenance

Java dependencies

To check for dependencies that require updates due to registered vulnerabilities:

./gradlew dependencyCheckAnalyze

You'll see terminal output after a couple of minutes and an HTML report will be generated in build/reports.

To check for all dependency updates:

./gradlew dependencyUpdates -DoutputFormatter=html

You'll see terminal output and an HTML report will be in build/dependencyUpdates.

Gradle wrapper update

To update the gradle wrapper for the project

./gradlew wrapper --gradle-version <new version>