Biomarker Knowledge Graph on AWS

DaniallMasood commented 3 months ago

Documentation to setting up instances of KG on AWS: https://github.com/clinical-biomarkers/biomarker-partnership/tree/main/supplementary_files/documentation

Goal: Create knowledge graphs and cypher queries based on biological questions and use cases

DaniallMasood commented 2 months ago

https://gwu0.sharepoint.com/:w:/r/sites/GlyGenTeam-GRP/_layouts/15/Doc.aspx?sourcedoc=%7B104E0A45-4E26-4F51-AB7B-496B19A7E660%7D&file=GlyGen-CFDE_Data_Distillery_Partnership_Proposal_Y2_full_proposal.docx&fromShare=true&action=default&mobileredirect=true

@rykahsay here is the proposal. I will get some example queries from @MiguelMazumder or he can put them in this ticket

rykahsay commented 2 months ago

Specific Aim 1: Create a capable and secure API interface for use in executing and accessing query results on the DD KG. Develop API endpoints that address most common use cases. Test the API for supporting queries and data extraction for specific use cases.

Specific Aim 2: Develop and implement a community-accessible website to query data in the DD graph that supports selected use cases from the DCCs. Tune the DD database to be ready for website data delivery, including importing new datasets to support website-driven use cases. This will include ingestion of necessary supporting data to extend the current KG functionality.

Specific Aim 3: Create algorithms and protocols to show how machine learning can be applied on the KG database, including link prediction, community detection, and knowledge cross-validation

jeet-vora commented 2 months ago

The above aims are good but they are more for the future. But right now what Raja wants you to do is -

Play with already installed Knowledge Graph in AWS and see how it works (Miguel has installed it - Documentation)
From the existing data in the knowledge graph come with some usecase and new queries
Evaluate if we can implement a knowledge graph in GlyGen and how it compare with the supersearch. Can we also have Cypher queries like SPARQL
LINCS AVIs team is working on the interface that can be implemented in Biomarker

Below is the email sent by Raja Hi Robel, Jeet is overall coordinating our involvement with Data Distiallary which is a Knowledge Graph (KG) project. We have downloaded the KG and we have tutorials on how to query it using Cypher query language. KG is accessible from AWS. We need the following where we can use your help/input

device some query and output that can be included in our biomarker paper or a GlyGen paper (Daniall is leading this effort with help from others)
integrate KG app from Avi into GlyGen or biomarker interface (Sujeet and Sean are leading this) Your work on API structure and data sites has already been very helpful. Once you have some familiarity with KG maybe we can meet. The project ends soon so we would like to have some work done by Aug end.

Jeet and Daniall - can you please meet with Robel and discuss? Also, send Robel the proposal.

MiguelMazumder commented 2 months ago

Knowledge Graph has been set up on the AWS server. To run the Docker container docker group access and vpn is required, navigate to data/KnowledgeGraph directory and run bash ./run_container.sh. Neo4j user interface will then be available at aws.glygen.org/neo4j. I will create more detailed documentation about this process

Use Case and query Ideas:

Gene-Disease Associations Query: Find all diseases associated with a particular gene.
Protein-Protein Interactions Query: Find all proteins that interact with a particular protein.
Disease-Chemical Compound Relationships Query: Find all chemical compounds associated with a particular disease.
Hierarchical Relationships within Body Structures Query: Find all body structures that are subparts of a particular structure.
Gene-Gene Product Relationships Query: Find all gene products produced by a particular gene.
Body Structure-Disease Relationships Query: Find diseases that affect specific body structures.
Synonym Relationships (Cross-references) Query: Find all equivalent identifiers for a specific entity.
Multi-domain Relationships Query: Find relationships that span multiple domains, such as genes in one ontology linked to diseases in another.
Entity-Annotation Relationships Query: Find all annotations associated with a particular entity.
@DaniallMasood Biomarker Drug relationship *if IDG has any specific information with entity-drug relationship

clinical-biomarkers / biomarker-partnership

Biomarker Knowledge Graph on AWS #149