This repository contains the code used to get information required for analysis of Reactome failed queries.
Interface Consistency:
Utils Package:
For extraction of MeSH terms, an UMLS license/account is required. If you do not have account, register at https://utslogin.nlm.nih.gov/cas/login and set the credentials in the configuration yaml file.
reactome_pmid_metadata.tsv
, which contains metadata of PMIDs present in Reactome.failed_query_analysis_output.tsv
, which contains details regarding the failed query terms.MTI WebAPI is used to get MeSH terms using their batch processing. Their code is in Java hence pyjnius is used to run the JAR files. The files are present in /lib.
These JAR files can be found in ziy/skr-webapi repository.
Following files are generated by the python notebooks, if the user only wants to perform Analysis using R code then they will be automatically downloaded from the links:
File | Generated by | Source |
---|---|---|
reactome_pmid_metadata.tsv | Reactome_PMID_Metadata_Extraction.ipynb | Link |
failed_query_analysis_output.tsv | Reactome_Failed_Query_Analysis.ipynb | Link |
Make a copy of parameters_sample.yml
named parameters.yml and set the configurations in it. Following are mandatory parameters to change in the YML file:
MTI Credentials, register at https://utslogin.nlm.nih.gov/cas/login
mti:
email_id : "example@example.com"
username : "username"
password : "password"
INDRA Database REST URL
indra_db_rest_url : "SET_INDRA_DB_URL"
Reactome Parameters
reactome_organism: "Homo sapiens"
User Query
query: "MATN2"
Please Note : If you want to skip Metadata file creation and only run the Analysis then skip step 3 and 4 and continue from step 5, the required files will be downloaded accordingly.
Execute Reactome_PMID_Metadata_Extraction.ipynb, this will generate reactome_pmid_metadata.tsv
file, which is used in step 5,
Execute Reactome_Failed_Query_Analysis.ipynb, this will generate failed_query_analysis_output.tsv
file, which is required in step 5
Do NOT perform Step 5 with partially generated output files from step 3 and 4. If you have partial file then delete those as the Rmd code with download missing files which are pre processed, if required.
*Please note:* This step will require complete TSV files generated by Step 3 and 4, if these files are not present in your directory or you have skipped step 3,4 then they will be downloaded.
In RStudio Console enter following
rmarkdown::render('Reactome_Analysis.Rmd', output_file = 'analysis_output.nb.html')
OR
Open [Reactome_Analysis.Rmd**](./Reactome_Analysis.Rmd) in RStudio and run all the chunks to generate the analysis using Ctrl + Alt + R
or follow the image below.
Output Files:
pip install --no-cache-dir -r ./dependencies/requirements.txt
R -e 'source("./dependencies/installPackages.R")'
Make a copy of parameters_sample.yml
named parameters.yml and set the configurations in it. Following are mandatory parameters to change in the YML file:
MTI Credentials, register at https://utslogin.nlm.nih.gov/cas/login
mti:
email_id : "example@example.com"
username : "username"
password : "password"
INDRA Database REST URL
indra_db_rest_url : "SET_INDRA_DB_URL"
Reactome Parameters
reactome_organism: "Homo sapiens"
User Query
query: "MATN2"
bash startup.sh path/to/parameters.yml
Output Files:
docker run --name reactome-failed-query-analysis pritishaw/reactome-failed-query-analysis:latest
docker pull pritishaw/reactome-failed-query-analysis:latest
pip install jupyter-repo2docker
jupyter-repo2docker https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020
/notebooks
/rstudio
to open RStudioSample file can be found here parameters_sample.yml
. Following configurations can be made using the file. For testing the Python notebooks, you can use the template parameters_test.yml
, it has configuration for processing a small subset of the query terms.
# PYTHON NOTEBOOK PARAMETERS ----
# Register at https://utslogin.nlm.nih.gov/cas/login for MTI credentials
mti:
email_id : "example@example.com"
username : "username"
password : "password"
pmid_threshold : 20
indra_db_rest_url : "SET_INDRA_DB_URL"
reactome_failed_terms_link : "https://gist.githubusercontent.com/PritiShaw/03ce10747835390ec8a755fed9ea813d/raw/cc72cb5479f09b574e03ed22c8d4e3147e09aa0c/Reactome.csv"
failed_query_threshold : null # null Indicates all terms will be processed
failed_query_hits_threshold : 10
reactome_pmid_url : "https://reactome.org/download/current/ReactionPMIDS.txt"
failed_query_output_file_path : "failed_query_analysis_output.tsv"
pmid_chunk_limit : 0
pmid_metadata_output_path : "reactome_pmid_metadata.tsv"
# R NOTEBOOK (Rmd) PARAMETERS ----
# Notebook
max_dt_table_display : 100
# Python environment
python_virtualenv : "/srv/venv"
# General
min_failed_search_hits : 10
# Rank Terms
top_n_reactome_journals : 10
min_indra_query_term_count : 0
min_indra_statement_count : 0
min_pmc_citation_count : 0
min_oc_citation_count : 0
# Reactome Parameters
reactome_organism: "Homo sapiens"
# User Query
query: "MATN2"
# Output
all_mesh_by_top_level_pathways_file : "all_mesh_by_top_level_pathways_full.txt"
top_level_pathways_file : "top_level_pathways.txt"
indra_stmt_html_file : "indra_output.html"
indra_stmt_json_file : "indra_output.json"
Papermill is used to parameterize the Python notebooks , to use this, follow the steps below:
Install from requirements.txt
pip install --no-cache-dir -r ./dependencies/requirements.txt
Setup Config YAML file
Create a copy of parameters_sample.yml and make the changes.
To Run the Notebooks
papermill Reactome_Failed_Query_Analysis.ipynb failed_query_analysis.ipynb --log-output -k python3 -f PATH/TO/CONFIG/FILE.yml
papermill Reactome_PMID_Metadata_Extraction.ipynb pmid_metadata.ipynb --log-output -k python3 -f PATH/TO/CONFIG/FILE.yml