Open leilaicruz opened 4 years ago
Idea 1: Look to the figure below 👇 where I described an idea of the feature matrix for the wild type strain to test and train a regression model in order to predict , in this case, the number of synthetic lethals(SL) per gene as a proxy for the interaction map.
The idea is then to build a good model that have more than 75% of accuracy in order to acceptably predict the number of SL of the genes of the analysed mutants. Every mutant (genetic background) has a feature matrix that will be entered into the model to make the prediction.
In this model I assume as the output variable , the number of SL per gene as a representation of the interaction map. But also could be the :
Idea 1: Look to the figure below 👇 where I described an idea of the feature matrix for the wild type strain to test and train a regression model in order to predict , in this case, the number of synthetic lethals(SL) per gene as a proxy for the interaction map.
The idea is then to build a good model that have more than 75% of accuracy in order to acceptably predict the number of SL of the genes of the analysed mutants. Every mutant (genetic background) has a feature matrix that will be entered into the model to make the prediction.
In this model I assume as the output variable , the number of SL per gene as a representation of the interaction map. But also could be the :
- total number of interactions
- ???
- ???
Nice idea! I was wondering though,what would be the data you use to build your model in this case?
In principle, I would generate together with the student 20-30 single mutants to do SATAY on them and have this type of data 🙏🤞
In principle, I would generate together with the student 20-30 single mutants to do SATAY on them and have this type of data 🙏🤞
Still, I don't really understand what you mean with this type of data? What would be the data that you would use to train/test the model?
Still, I don't really understand what you mean with this type of data? What would be the data that you would use to train/test the model?
The idea is to validate how well the model generated with the WT data can be extended to predict number of SL of the genes(or another proxy that represents the interaction map) in a different genetic background. And more importantly, we want to know how well we can predict number of SL of genes in WT using SATAY data.
One important thing to take into account is that the feature matrix should be built such as every feature gives information , in this case, of the different genetic backgrounds .
In this case the features related to the functional properties of every gene will only be available for the WT background . What I mean is that for the mutants , we actually dont know (there is no database with that info) how the "new" functions of the rest of the genes changes , so then those columns will remain constant in the mutants , and hence wont contribute to any new insight from the data.
To guide the thinking, I will reflect on:
This issue will be to draw ideas on possible ways to quantify the changes in the interaction map of budding yeast once one mutation is made, from the SATAy DATA output.