allie-walker / Natural-product-function

scripts for predicting natural product activity from biosynthetic gene cluster sequences
MIT License
19 stars 8 forks source link

Results_Interpretation #10

Open Oelsakha opened 1 year ago

Oelsakha commented 1 year ago

Hi Thank you so much for your nice tool and your hard work. I have two question:

  1. I used antismash verion 7 and the tool works well ???
  2. How to choose cutoff value for tree classifier, logistic regression classifier, and svm classifier. Thank you
allie-walker commented 1 year ago

It will run with other versions of antismash that it was not trained on, but the predictions may be slightly less accurate due to the gene annotations being slightly different between versions. We are almost done adding support for antismash6 and will start working on adding support for 7 once it is out of beta. So far we have seen that using a mismatched training set/input antismash verison doesn't affect the predictions that much but we haven't tested it too rigorously.

For the cutoffs it depends on the application and how tolerant you are to false positives. The values represent a probability of activity. So anything >50% means that the ml classifier thinks it is more likely to be active than not. But that does not mean that everything <50% will not be active. If you are limited in how much you can screen you can use a higher cutoff for a better chance of success. Also if all three classifiers give similar probabilities that would likely indicate a more reliable probability, if they disagree (e.g. one says 20% active, the other says 60%) it could indicate a cluster that is difficult to predict on because it is too dissimilar to the training set. We are still testing how it works on novel gene clusters but generally we look for all three classifiers to give probabilities >50%.

Oelsakha commented 1 year ago

Thank you so much!

liangly1 commented 1 month ago

Hi Thank you so much for your nice tool and your hard work. I have two question: Can this method currently predict the results of Antismash 7.0 version? If not,Approximately when can it be updated.