Ordinal regression - Githubissues

gitter-lab / active-learning-drug-discovery

End-to-end active learning pipeline for virtual screening and drug discovery

MIT License

3 stars 0 forks source link

This is somewhat related to consensus fingerprint for a cluster. The cluster-based-selector now supports an option for computing cluster dissimilarity using consensus fingerprints rather than comparing every instance within each cluster. The formula for consensus fingerprint of cluster ci:

ci_instances = np.where(clusters == ci)[0] X_consensus = ((np.sum(X[ci_instances,:], axis=0) / ci_instances .shape[0]) >= 0.5).astype(float)

In words, we set the bit at position i if the majority of the instances have that bit set. Randomly applying this dissimilarity computation on 20 dense clusters gives results that are mostly similar to the instance-by-instance method (within +- 0.04 in most cases, few cases had +-0.1).

This consensus method should reduce overall memory costs.

gitter-lab / active-learning-drug-discovery

Ordinal regression #7