Using regression to estimate the probabilities for each gene to be essential or not given the SATAY data

Go HERE to see the details of the python program.

If we plot the reads and insertions per gene and highlight if they are essential or not from published data , we see this 👇

Since both datasets sort of overlap (after truncating the datasets and removing outliers) the regression model can not predict essential genes with more than 0.5 probability .

However, if we go deep into the probabilites we can see that if the probability of being essential is bigger than 0.3 already 76% of all essential genes fall inside it .

SATAY-LL / LaanLab-SATAY-DataAnalysis

Using regression to estimate the probabilities for each gene to be essential or not given the SATAY data #20