Identity analysis - Githubissues

It may not be possible to do an extensive test like this - the idea is whether or not a given protein across all pairs that it is found combined with, gives a characteristic fingerprint across the channel data. I would compute this for all proteins. It could tell us whether there is a signal to be learned by the algorithm for defaulting its predictions based on “identity”.

At the least I can check the distribution of class labels of “group A” and quantify the skewness of the outcome response. E.g. for all pairs containing a given protein, how often is the class label positive vs negative etc (edited)

If we find that there is a correlation between proteins from group-A in the ANOVA test which is shown to be statistically different from group B, and a high predictive accuracy for those pairs that contain the group-A identity protein. It would increase suspicion of the identity problem

Sum02dean / STRINGSCORE

Identity analysis #42