Open DaniallMasood opened 3 months ago
@rykahsay here is the proposal. I will get some example queries from @MiguelMazumder or he can put them in this ticket
Specific Aim 1: Create a capable and secure API interface for use in executing and accessing query results on the DD KG. Develop API endpoints that address most common use cases. Test the API for supporting queries and data extraction for specific use cases.
Specific Aim 2: Develop and implement a community-accessible website to query data in the DD graph that supports selected use cases from the DCCs. Tune the DD database to be ready for website data delivery, including importing new datasets to support website-driven use cases. This will include ingestion of necessary supporting data to extend the current KG functionality.
Specific Aim 3: Create algorithms and protocols to show how machine learning can be applied on the KG database, including link prediction, community detection, and knowledge cross-validation
The above aims are good but they are more for the future. But right now what Raja wants you to do is -
Below is the email sent by Raja Hi Robel, Jeet is overall coordinating our involvement with Data Distiallary which is a Knowledge Graph (KG) project. We have downloaded the KG and we have tutorials on how to query it using Cypher query language. KG is accessible from AWS. We need the following where we can use your help/input
Jeet and Daniall - can you please meet with Robel and discuss? Also, send Robel the proposal.
Knowledge Graph has been set up on the AWS server. To run the Docker container docker group access and vpn is required, navigate to data/KnowledgeGraph directory and run bash ./run_container.sh. Neo4j user interface will then be available at aws.glygen.org/neo4j. I will create more detailed documentation about this process
Use Case and query Ideas:
Gene-Disease Associations Query: Find all diseases associated with a particular gene.
Protein-Protein Interactions Query: Find all proteins that interact with a particular protein.
Disease-Chemical Compound Relationships Query: Find all chemical compounds associated with a particular disease.
Hierarchical Relationships within Body Structures Query: Find all body structures that are subparts of a particular structure.
Gene-Gene Product Relationships Query: Find all gene products produced by a particular gene.
Body Structure-Disease Relationships Query: Find diseases that affect specific body structures.
Synonym Relationships (Cross-references) Query: Find all equivalent identifiers for a specific entity.
Multi-domain Relationships Query: Find relationships that span multiple domains, such as genes in one ontology linked to diseases in another.
Entity-Annotation Relationships Query: Find all annotations associated with a particular entity.
@DaniallMasood Biomarker Drug relationship *if IDG has any specific information with entity-drug relationship
Documentation to setting up instances of KG on AWS: https://github.com/clinical-biomarkers/biomarker-partnership/tree/main/supplementary_files/documentation
Goal: Create knowledge graphs and cypher queries based on biological questions and use cases