From SATAY data to the prediction of the interaction map

leilaicruz commented 4 years ago

This issue will be to draw ideas on possible ways to quantify the changes in the interaction map of budding yeast once one mutation is made, from the SATAy DATA output.

leilaicruz commented 4 years ago

Idea 1: Look to the figure below 👇 where I described an idea of the feature matrix for the wild type strain to test and train a regression model in order to predict , in this case, the number of synthetic lethals(SL) per gene as a proxy for the interaction map.

The idea is then to build a good model that have more than 75% of accuracy in order to acceptably predict the number of SL of the genes of the analysed mutants. Every mutant (genetic background) has a feature matrix that will be entered into the model to make the prediction.

In this model I assume as the output variable , the number of SL per gene as a representation of the interaction map. But also could be the :

total number of interactions
???
???

EKingma commented 4 years ago

Idea 1: Look to the figure below 👇 where I described an idea of the feature matrix for the wild type strain to test and train a regression model in order to predict , in this case, the number of synthetic lethals(SL) per gene as a proxy for the interaction map.

The idea is then to build a good model that have more than 75% of accuracy in order to acceptably predict the number of SL of the genes of the analysed mutants. Every mutant (genetic background) has a feature matrix that will be entered into the model to make the prediction.

In this model I assume as the output variable , the number of SL per gene as a representation of the interaction map. But also could be the :

total number of interactions

???

???

Nice idea! I was wondering though,what would be the data you use to build your model in this case?

leilaicruz commented 4 years ago

In principle, I would generate together with the student 20-30 single mutants to do SATAY on them and have this type of data 🙏🤞

EKingma commented 4 years ago

In principle, I would generate together with the student 20-30 single mutants to do SATAY on them and have this type of data 🙏🤞

Still, I don't really understand what you mean with this type of data? What would be the data that you would use to train/test the model?

leilaicruz commented 4 years ago

Still, I don't really understand what you mean with this type of data? What would be the data that you would use to train/test the model?

The idea is to validate how well the model generated with the WT data can be extended to predict number of SL of the genes(or another proxy that represents the interaction map) in a different genetic background. And more importantly, we want to know how well we can predict number of SL of genes in WT using SATAY data.

leilaicruz commented 4 years ago

One important thing to take into account is that the feature matrix should be built such as every feature gives information , in this case, of the different genetic backgrounds .

In this case the features related to the functional properties of every gene will only be available for the WT background . What I mean is that for the mutants , we actually dont know (there is no database with that info) how the "new" functions of the rest of the genes changes , so then those columns will remain constant in the mutants , and hence wont contribute to any new insight from the data.

To guide the thinking, I will reflect on:

What else from the SATAY experiment can we extract that changes with the background, like the insertions and the reads?
How can we model the functional analysis of every gene in a different genetic background? to integrate it in the features as info that changes per genetic background.
What else can we use as an output for the model , that is known and relevant to connect with the interaction map?

SATAY-LL / LaanLab-SATAY-DataAnalysis

From SATAY data to the prediction of the interaction map #18