Goals and approach of the proposal

leilaicruz commented 3 years ago

Here lets discussed what you have as goals and the approach you are thinking to tackle your research question/s

T-Wisse commented 3 years ago

I am struggeling with this. The main question is how genetic interactions in a specific module (mainly cell polarity?) change when a mutation is introduced. In Constanzo et al. 2016 they show that essential genes involved in cell polarity cluster together suggesting genes involved are well connected. Also for all essential genes their negative interactions are enriched mostly in protein-protein interactions and shared protein complexes. However, studying essential genes is an issue experimentally. Then we have the non-essential genes which have less interactions but are similarly enriched for positive interactions. However, just looking at it the issue will be that this will not give us enough power to predict genetic interactions accurately.

Therefore, we will either need more data which can be used for this or by limiting ourselves to a specifc module get more from what we have, like the go terms. To identify which modules might be easiest to work with, I will look at correlations between attributes and genetic interaction type within the modules as leila did here https://github.com/leilaicruz/python-modules-for-bioinformatic-analyses/blob/master/docs/Bioinformatics-analyses.md . I think with this my idea of a classifier is not necessary.

Since I am not sure how well we can do and this will be useful either way, I first want to make predictions based on common go-terms and protein interaction. However, I do not know how to do this. We talked about using machine learning. I will have to look in to that.

Once we have predictions they have to be verified experimentally. By choosing genes for which we know the genetic interaction changes already this does not have to be a lot of work. However, I am not sure how many we would need to get a proper estimation of how good predictions are, and as far as I have seen there is not a lot of GI data for mutants. We may also want to see how valid this is outside of the functional module.

leilaicruz commented 3 years ago

I am struggeling with this. The main question is how genetic interactions in a specific module (mainly cell polarity?) change when a mutation is introduced. In Constanzo et al. 2016 they show that essential genes involved in cell polarity cluster together suggesting genes involved are well connected. Also for all essential genes their negative interactions are enriched mostly in protein-protein interactions and shared protein complexes. However, studying essential genes is an issue experimentally. Then we have the non-essential genes which have less interactions but are similarly enriched for positive interactions. However, just looking at it the issue will be that this will not give us enough power to predict genetic interactions accurately.

Explain more what do you mean by : "essential genes involved in cell polarity cluster together suggesting genes involved are well connected" is not clear now. You should explain actually how interactors of essential genes are measured because if you follow the usual procedure you realise it does not make sense , because those cells lacking that gene are not viable , following the essentiality definition . So they made a trick to say they compute the interactions of essential genes but it is not as straightforward as the others. So I wont focus on cheking the interactors of essential genes but more about non essential genes.

About the prediction , what exactly do you want to predict? Because if you want to predict the probability that gene A and gene B loss or gain an interaction given that gene C is missing , then you would need SATAY data on WT, dgeneC, dgeneAdgeneC , dgeneBdgeneC background to validate it. I tried to depict this idea here 👇

Remember the data you have so far is on WT background (using the SGA method) , that is still useful and is the one we should use to validate our method to get GI from SATAY data in the WT background in this case. This dataset is also incomplete in the sense that is an incomplete mapping of all genes from the genome to investigate the interactions with gene A, for example. That is the advantadge of SATAY, that is a complete library where each gene has to be disrupted by one transposon given certain background.

Therefore, we will either need more data which can be used for this or by limiting ourselves to a specifc module get more from what we have, like the go terms. To identify which modules might be easiest to work with, I will look at correlations between attributes and genetic interaction type within the modules as leila did here https://github.com/leilaicruz/python-modules-for-bioinformatic-analyses/blob/master/docs/Bioinformatics-analyses.md . I think with this my idea of a classifier is not necessary.

So the classifier comes to play when you set up your machine learning algorithm . If you have a classification problem , so you want to assign lets say two classes to the output then you can use different methods like the naive Bayes , regression , SVM , etc. The most important is your model , that is what are you going to use to learn from .

Since I am not sure how well we can do and this will be useful either way, I first want to make predictions based on common go-terms and protein interaction. However, I do not know how to do this. We talked about using machine learning. I will have to look in to that.

So you have the datafile of the go slim terms per gene , and you have gene pairs that you can list their go slim terms and check how many are common from the total amount and also put 0 or 1(an example) if they also share a physical interaction (protein interactions). You can find this in the biogrid dataset from my repo and the go slim terms HERE. From these files you should be able to get that info. Also we can do a session together if you have some extra concern on how to implement this. In any case you have in my repo an example of how I did it HERE

Once we have predictions they have to be verified experimentally. By choosing genes for which we know the genetic interaction changes already this does not have to be a lot of work. However, I am not sure how many we would need to get a proper estimation of how good predictions are, and as far as I have seen there is not a lot of GI data for mutants. We may also want to see how valid this is outside of the functional module.

This would be after we validate the machine learning model as accurate enough so we can trust on the predictions this model is able to give. So, now because we dont have a big collection of SATAY data , you should focus on which attributes , features are best to predict genetic interactions e.g. common go terms, common GIs, physical interactors , etc , etc

leilaicruz commented 3 years ago

What is the outcome of this issue? Did you start a draft of your proposal/Thesis project? Please put here where can I find it or where you plan to do it in order to start making progress in it

T-Wisse commented 3 years ago

I have started with it but I have not worked on it for a while. You should be able to find it here: https://www.overleaf.com/9465291539vnkdxbjphfgj . This link should remain valid.

leilaicruz commented 3 years ago

I can not edit the document .... maybe is better if you use the wiki in github , just copy and paste what you have there, also keep the same organization as you have in your latex document. Because there I can contribute better, and also then you can copy back to your latex file to render a nice pdf (although you can also use markdown for it + Pandoc ) I can another time show you, because it allows then to reuse your markdown notes to a nice pdf output.

T-Wisse commented 3 years ago

Ah right, I gave you the wrong link so it was read only. I have updated it above. I will give the wiki a try

leilaicruz commented 3 years ago

yes , give it a try :)

On Thu, 28 Jan 2021 at 14:17, Thomas notifications@github.com wrote:

Ah right, I gave you the wrong link so it was read only. I have updated it above. I will give the wiki a try

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/T-Wisse/MEP_Thomas/issues/3#issuecomment-769045814, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXNYSQFXANB3XFHX2UU73LS4FPWBANCNFSM4T2A2SXQ .

-- Leila M. Iñigo de la Cruz

leilaicruz commented 3 years ago

here put a link of where are you writing your project methodology, report , proposal or whatever is called :)

T-Wisse / MEP_Thomas

Goals and approach of the proposal #3