clinical-biomarkers / biomarker-partnership

CFDE Biomarker Partnership
https://hivelab.biochemistry.gwu.edu/biomarker-partnership
MIT License
0 stars 0 forks source link

Biomarker Knowledge Graph on AWS #149

Open DaniallMasood opened 3 months ago

DaniallMasood commented 3 months ago

Documentation to setting up instances of KG on AWS: https://github.com/clinical-biomarkers/biomarker-partnership/tree/main/supplementary_files/documentation

Goal: Create knowledge graphs and cypher queries based on biological questions and use cases

DaniallMasood commented 2 months ago

https://gwu0.sharepoint.com/:w:/r/sites/GlyGenTeam-GRP/_layouts/15/Doc.aspx?sourcedoc=%7B104E0A45-4E26-4F51-AB7B-496B19A7E660%7D&file=GlyGen-CFDE_Data_Distillery_Partnership_Proposal_Y2_full_proposal.docx&fromShare=true&action=default&mobileredirect=true

@rykahsay here is the proposal. I will get some example queries from @MiguelMazumder or he can put them in this ticket

rykahsay commented 2 months ago

Specific Aim 1: Create a capable and secure API interface for use in executing and accessing query results on the DD KG. Develop API endpoints that address most common use cases. Test the API for supporting queries and data extraction for specific use cases.

Specific Aim 2: Develop and implement a community-accessible website to query data in the DD graph that supports selected use cases from the DCCs. Tune the DD database to be ready for website data delivery, including importing new datasets to support website-driven use cases. This will include ingestion of necessary supporting data to extend the current KG functionality.

Specific Aim 3: Create algorithms and protocols to show how machine learning can be applied on the KG database, including link prediction, community detection, and knowledge cross-validation

jeet-vora commented 2 months ago

The above aims are good but they are more for the future. But right now what Raja wants you to do is -

Below is the email sent by Raja Hi Robel, Jeet is overall coordinating our involvement with Data Distiallary which is a Knowledge Graph (KG) project. We have downloaded the KG and we have tutorials on how to query it using Cypher query language. KG is accessible from AWS. We need the following where we can use your help/input

Jeet and Daniall - can you please meet with Robel and discuss? Also, send Robel the proposal.

MiguelMazumder commented 2 months ago

Knowledge Graph has been set up on the AWS server. To run the Docker container docker group access and vpn is required, navigate to data/KnowledgeGraph directory and run bash ./run_container.sh. Neo4j user interface will then be available at aws.glygen.org/neo4j. I will create more detailed documentation about this process

Use Case and query Ideas:

  1. Gene-Disease Associations Query: Find all diseases associated with a particular gene.

  2. Protein-Protein Interactions Query: Find all proteins that interact with a particular protein.

  3. Disease-Chemical Compound Relationships Query: Find all chemical compounds associated with a particular disease.

  4. Hierarchical Relationships within Body Structures Query: Find all body structures that are subparts of a particular structure.

  5. Gene-Gene Product Relationships Query: Find all gene products produced by a particular gene.

  6. Body Structure-Disease Relationships Query: Find diseases that affect specific body structures.

  7. Synonym Relationships (Cross-references) Query: Find all equivalent identifiers for a specific entity.

  8. Multi-domain Relationships Query: Find relationships that span multiple domains, such as genes in one ontology linked to diseases in another.

  9. Entity-Annotation Relationships Query: Find all annotations associated with a particular entity.

  10. @DaniallMasood Biomarker Drug relationship *if IDG has any specific information with entity-drug relationship