Closed Onariaginosa closed 9 months ago
@Onariaginosa, we are actually not going to use the "complex" annotations. There is a different query for "physical interactions" that we will use. See this example https://yeastmine.yeastgenome.org/yeastmine/results.do?trail=%257Cquery
Retain "Gene" and "Protein" tables. But instead of "Complex" and "Protein-Complex" we will have a table called "Physical-Interactions"
Physical Interactions table would be two-column table with protein1 and protein2
Change schema names to be more informative:
Here is the revised schema:
This looks fine. I think if you are doing the physical interactions query, you won't get additional information about the protein, like length, molecular weight, or P1 (which I don't know what that is anyway).
We could consider bringing down the other physical interactions data that comes with the query such as:
and put them in the "Physical_Interactions" table so that we preserve the meta data about the interaction.
We get the length, molecular weight, etc. when we get all proteins using the proteins query. Should I still keep them? Attached is the new proposed schema
Go ahead and keep the data.
@Onariaginosa has created the schema, it loads properly, and she is working on getting the data from Yeastmine.
After, shift over to working on the controls, and DAL.
Working on changing names from "Spring 2022" to better names. Wants to deploy during Spring Break.
Progress Report: I renamed the schemas as requested. I also created the generator scripts for the protein interactions database. Note: It takes approximately and hour and 45 minutes (give or take) to get all of the physical interactions between the proteins . . . . . . Will focus on the loader scripts and DAL next before working on the UI.
Add source to schema so that generate network modal can act the same as the grn network generator.
@dondi saw that this is done on the PR #1039
I went on SGD/YeastMine to see if there was anything related to querying for protein protein interactions. In the proteins tab you could query for all proteins. The following screenshot captures all of the information you can receive for each protein result.
In the interactions tab, I found that you can query based on complex and it returns the complex participants. You can restrict the query such that you return complexes that contain a given protein. The following screenshot captures all of the information you can recieve for each complex -- participant pairing (The database seems to be stored as a many to many)
My preliminary inclination is to store the protein data in the following schema![image](https://user-images.githubusercontent.com/21343072/219086087-00332708-b347-43f7-821d-eeb0c6c7a4d0.png)
Is there anything that should be revised?
Side note: I didn't know the language of protein - protein interactions, so I read this article for preliminary knowledge.