CurFu: Curating fusions with the VICC Gene Fusion Guidelines

CurFu is an interactive curation tool for describing and representing gene fusions in a computable manner. It's developed to support the VICC Fusion Guidelines project.

Development

Installation

Clone the repo:

git clone https://github.com/cancervariants/fusion-curation
cd fusion-curation

Ensure that the following data sources are available:

the VICC Gene Normalization database, accessible from a DynamoDB-compliant service. Set the endpoint address with environment variable GENE_NORM_DB_URL; default value is http://localhost:8000.
the Biocommons SeqRepo database, used by Cool-Seq-Tool. The precise file location is configurable via the SEQREPO_ROOT_DIR variable, per the documentation.
the Biocommons Universal Transcript Archive, by way of Genomic Med Lab's Cool Seq Tool package. Connection parameters to the Postgres database are set most easily as a Libpq-compliant URL under the environment variable UTA_DB_URL.

Create a virtual environment for the server and install. Note: there's also a Pipfile so you can skip the virtualenv steps if you'd rather use a Pipenv instance instead of virtualenv/venv. I have been sticking with the latter because Pipenv doesn't play well with entry points in development, but if you aren't editing them in setup.cfg, then the former should be fine.

cd server  # regardless of your environment decision, build it in server/
virtualenv venv
source venv/bin/activate
python3 -m pip install -e ".[dev,tests]"  # make sure to include the extra dependencies!

Acquire two sets of static assets and place all of them within the server/src/curfu/data directory:

Gene autocomplete files, providing legal gene search terms to the client autocomplete component. One file each is used for entity types aliases, assoc_with, xrefs, prev_symbols, labels, and symbols. Each should be named according to the pattern gene_<type>_<YYYYMMDD>.tsv. These can be regenerated with the shell command curfu_devtools genes.
Domain lookup file, for use in providing possible functional domains for user-selected genes in the client. This should be named according to the pattern domain_lookup_YYYYMMDD.tsv. These can be regenerated with the shell command curfu_devtools domains, although this is an extremely time- and storage-intensive process.

Your data/directory should look something like this:

server/src/curfu/data
├── domain_lookup_2022-01-20.tsv
├── gene_aliases_suggest_20211025.tsv
├── gene_assoc_with_suggest_20211025.tsv
├── gene_labels_suggest_20211025.tsv
├── gene_prev_symbols_suggest_20211025.tsv
├── gene_symbols_suggest_20211025.tsv
└── gene_xrefs_suggest_20211025.tsv

Finally, start backend service with curfu, by default on port 5000. To use a different port, pass the number to the -p option:

curfu -p 5000

In another shell, navigate to the repo client/ directory and install frontend dependencies:

cd client
yarn install

If you get the following error:

error api@3.4.2: The engine "node" is incompatible with this module. Expected version "^12 || ^14 || ^16". Got "18.0.0"
error Found incompatible module.
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

You can run:

yarn install --ignore-engines

Then start the development server:

yarn start

Shared type definitions

The frontend utilizes Typescript definitions generated from the backend pydantic schema. These can be refreshed, from the server environment, with the command curfu_devtools client-types. This will only work if json2ts has been installed in the client's node_modules binary directory.

Style

Python code style is enforced by flake8 and Black, and frontend style is enforced by ESLint and Prettier. Conformance is ensured by pre-commit. Before your first commit, run

pre-commit install

This will require installation of dev dependencies on the server side.

In practice, Prettier and Black will do most of the formatting work for you to be in accordance with ESLint and flake8. In the backend, run python3 -m black path/to/file, and in the frontend, run yarn run prettier --write path/to/file to autoformat a file.

Tests

Backend tests require installation of tests dependencies. Run with pytest.

Generating requirements

requirements.txt is used for Elastic Beanstalk to install the dependencies. Anytime you update package requirements, be sure to update requirements.txt. To generate run the below command from server directory (ensure you have started the venv):

pip freeze --exclude-editable > ../requirements.txt

cancervariants / fusion-curation

readme