SATAY-LL / LaanLab-SATAY-DataAnalysis

This contains codes and workflows for data analysis regarding SATAY experiments.
Apache License 2.0
4 stars 3 forks source link

Try using regression models to predict transposition insertion depending on the genomic length and position #12

Closed leilaicruz closed 3 years ago

leilaicruz commented 4 years ago

It will be nice to apply statistical inference to a our large dataset to try to make meaningful predictions from the data. One example is to predict for example the continous variable of the number of transposon insertions per ORF given the length and the position of it. For this what is recommended is to use Regression models. Whether we need linear or non linear models , we have to "discover it" by looking at the relationships between the variables . For inspiration , and examples you can look in this folder LINK where I have some examples notebooks to try things out

Wteunisse commented 3 years ago

I don't think I understand this completely, why would you want to predict the number of transposons and what would you base your prediction on? Is this about what we discussed with the Poisson distribution?

leilaicruz commented 3 years ago

This was an idea to predict SATAY outcome if possible for new strains, without actually measuring them. However , the experimental outcome of this technique depends on experimental factors like : incubation times, incubation media, number of cells to reseed, etc.

However maybe still interesting as outlook is to simulate the SATAY experiment in a computer to play with all those factors as parameters of a model. And then we can see the effects of changing different parameters on the transposition efficiency.

I will close this issue since it does not have a clear follow up .