WGLab / CancerVar

Clinical interpretation of somatic mutations in cancer
42 stars 13 forks source link

CancerVar & OPAI

Clinical interpretation of Cancer somatic Variants (CancerVar) and Oncogenic Prioritization by Artificial Intelligence (OPAI)


CancerVar takes either pre-annotated files, or unannotated input files in VCF format or ANNOVAR input format, where each line corresponds to one genetic variant; CancerVar will call ANNOVAR to generate necessary annotations. In the output, based on all 12 pieces of evidence, each variant will be assigned as "Tier_I_strong", "Tier_II_potential", "Tier_IV_benign" and "Tier_III_Uncertain" by rules specified in the AMP/ASCO/CAP 2017 guidelines.

OPAI takes 12 clinical evidence scores from CancerVar and 23 pre-computed in silico scores predicted by other computational tools from ANNOVAR as input, and predicts oncogenicity by a semi-supervised deep-learning model.

CanverVar and OPAI are Python based scripts. The user need to run CancerVar firstly as step 1 to get clinical evidence-based interpretation results and then run OPAI as step 2 if they want to get the deep-learning model-based oncogenicity prediction.

CancerVar(step 1)


CancerVar.py [options]


CanverVar is a python script for cancer variant interpretation of clinical significance.


  1. You need install Python >=3.6
  2. You need install [ANNOVAR](http://annovar.openbioinformatics.org/en/latest/) version >= 2016-02-01.
  3. Most of the datases can be downloaded automatically.
  4. Some updated datasets(cosmic and icgc) for Annovar: https://cancervar.wglab.org/databases/ (download and gunzip, put in the Annovar db folder)
  5. Please use the updated files, outdated files will bring some problems of running CancerVar.

OPTIONS of CancerVar script

if you use this options,you can ignore all the other options bellow.

EXAMPLE of CancerVar

    python3.6 ./CancerVar.py -c config.ini  # Run the examples in config.ini
    python3.6 ./CancerVar.py  -b hg19 -i your_input  --input_type=VCF  -o your_output
    python3.6 ./CancerVar.py  -b hg19 -i example/FDA_hg19.av -o example/FDA

The clinical interpretation results are in the ouput file of *".cancervar", the column of "CancerVar: CancerVar and Evidence"** is the evidence and final interpretation.

OPAI(step 2)

After running CancerVar correctly and getting the output files of ".cancervar" and ".grl_p",we are ready to run Oncogenic Prioritization by Artificial Intelligence.


OPAI is a python script for Oncogenic Prioritization by Artificial Intelligence after CancerVar. OPAI firstly call feature_preprocess.py to process the features coding from CancerVar and Annovar output, then call opai_predictor.py to predict the oncogenicity.

The OPAI scripts are in the scripts folder of “OPAI”:


OPAI has currently only been tested with Python 3.6+, and requires four Python modules to be installed and in path. These are numpy https://numpy.org, pandas https://pandas.pydata.org , scikit-learn https://scikit-learn.org and pytorch https://pytorch.org.

There are two ways to install these modules:

The predicted oncogenicity are in the (last)column of **"evs_score"** in file `example/FDA.hg19_multianno.txt.cancervar.evs.pred`.

- Feature process using `feature_preprocess.py`
python3.6  OPAI/scripts/feature_preprocess.py -h
usage: feature_preprocess.py [-h] -a ANNOVAR_PATH -c CANCERVAR_PATH [-m METHOD] [-n MISSING_COUNT] -d DATABASE -o OUTPUT

feature creator from cancervar output

optional arguments:
  -h, --help            show this help message and exit
  -a ANNOVAR_PATH, --annovar_path ANNOVAR_PATH
                        the path to annovar file
  -c CANCERVAR_PATH, --cancervar_path CANCERVAR_PATH
                        the path to cancervar file
  -m METHOD, --method METHOD
                        output evs features or ensemble features (option: evs, ensemble)
  -n MISSING_COUNT, --missing_count MISSING_COUNT
                        variant with more than N missing features will be discarded, (default: 5)
  -d DATABASE, --database DATABASE
                        database for feature normalization
  -o OUTPUT, --output OUTPUT
                        the path to output

optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT the path to input feature -v CANCERVAR_PATH, --cancervar_path CANCERVAR_PATH the path to cancervar file -m METHOD, --method METHOD use evs features or ensemble features (option: evs, ensemble) -d DEVICE, --device DEVICE device used for dl-based predicting (option: cpu, cuda) -c CONFIG, --config CONFIG the path to trained model file -o OUTPUT, --output OUTPUT the path to output

## Web server
We also developed a web server [http://cancervar.wglab.org](http://cancervar.wglab.org), which offers a graphical user interface for CancerVar and OPAI scores. 

This web server provided pre-compiled 13M mutations annotation results and OPAI scores. Users can directly search their exonic variants by chromosomal position, by dbSNP identifier, or by gene name with the nucleic acid/amino acid change. The web server will provide full details on the variants, including all automatically generated criteria, most of the supportive evidence and also OPAI scores.


CancerVar and OPAI is free for non-commercial use without warranty. Users need to obtain licenses such as ANNOVAR by themselves. Please contact the authors for commercial use.


Quan Li, Zilin Ren, Kajia Cao, Marilyn M. Li, Yunyun Zhou and Kai Wang. CancerVar: an Artificial Intelligence empowered platform for clinical interpretation of somatic mutations in cancer ( Science Advances, 2022, [https://www.science.org/doi/10.1126/sciadv.abj1624](https://www.science.org/doi/10.1126/sciadv.abj1624) )

Quan Li and Kai Wang. InterVar: Clinical interpretation of genetic variants by ACMG-AMP 2015 guideline. The American Journal of Human Genetics 100, 1-14, February 2, 2017,[http://dx.doi.org/10.1016/j.ajhg.2017.01.004](http://dx.doi.org/10.1016/j.ajhg.2017.01.004)

[The  AMP/ASCO/CAP 2017 guidelines ](https://www.ncbi.nlm.nih.gov/pubmed/27993330)
Li MM, Datto M, Duncavage EJ, Kulkarni S, Lindeman NI, Roy S, Tsimberidou AM, Vnencak-Jones CL, Wolff DJ, Younes A, Nikiforova MN.
Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer: A Joint Consensus Recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists.

[The  ACMG/CGC 2019 guidelines ](https://www.ncbi.nlm.nih.gov/pubmed/31138931)
Mikhail FM, et al. Technical laboratory standards for interpretation and reporting of acquired copy-number abnormalities and copy-neutral loss of heterozygosity in neoplastic disorders: a joint consensus recommendation from the American College of Medical Genetics and Genomics (ACMG) and the Cancer Genomics Consortium (CGC). Genet Med. 2019 Sep;21(9):1903-1916. doi: 10.1038/s41436-019-0545-7.

## Acknowledges

Thanks to all who provided bug reports.