Closed darcyabjones closed 4 years ago
Essentially we need a way of deciding what looks like an effector and what doesn't.
There are 3 methods that we've discussed to do this:
Note that with the ML classifier, we wouldn't be able to include user-supplied data (e.g. positive selection; unless we then include that analysis in the pipeline). For the manual weights method, users would have to supply the weights and normalise their own data.
I think what we'll go for is a combination of all three methods.
For now we're only progressing with the ranking method. Which James and I are currently finalising.
The ML method that I'd like to implement is a learning to rank solution. Especially, the lambdaMART implementation in xgboost. Boosted trees have a few nice properties that work for us here. Especially ease of interpretability, and the ability to weight samples which we can use to overcome the class imbalance.
This has now been done. We use a learning to rank method. It's much more reliable that the manual scores. Particularly the one without homology.
This is required for the version 1 release. Progress is being tracked in the project classifier and ranking....