Closed albertma-evotec closed 2 years ago
These are the predicted means (end uncertainties, if needed) of your surrogate model, they’re used for checkpointing and retrospective analysis. The array is parallel to the molecules in your library.
These are the predicted means (end uncertainties, if needed) of your surrogate model, they’re used for checkpointing and retrospective analysis. The array is parallel to the molecules in your library.
Can I say that the higher the number is, the more likely the corresponding compound being having a better (more negative) docking score? Or do these 'predicted means' not necessarily correlate to the docking scores?
I ever tried to plot these numbers (at the final iteration) against the true docking scores. I did not see much correlation.
The question of surrogate model accuracy is related but not strictly similar to optimization performance. You can make whatever claims/analyses you want, we’re just giving you the information. In a greedy optimization, the most important model-based metric is rank correlation because new points are prioritized solely based on predicted mean
Hi I was using molpal for a retrospective docking study. The objective configuration is to look up the already-known docking scores. I am trying to understand the output files. I found the Y_pred.npy file is a numpy array of float point numbers. Its size is the same as my molecular library. Are these numbers the values reflecting how 'good' the corresponding compounds are so molpal will select them for next iteration exploration? or are they simply predicted docking scores by the RF regression model? And does the order of these number follow the order of the compounds in the library file?
Below is my config file:
Many thanks