YerevaNN / BARTSmiles

BARTSmiles, generative masked language model for molecular representations
MIT License
31 stars 4 forks source link

Regression results comparison #9

Open csnbritt opened 1 year ago

csnbritt commented 1 year ago

Hi - in your article, I believe that the comparisons of the ESOL, Lipo, and FreeSolv results to other models may be misleading. It appears based on Figure 6 that you may be using the normalized values of the regression results to calculate performance metrics. For example, the compound "CCCCC(C)O" in the raw data has a value of -0.89, whereas the value that you report in Figure 6 is 1.03, the same value that I get when I normalize the entire dataset using sklearns standard scaler. The same is also true of the other compounds in that figure. The majority/all of models in Table 4a use the unscaled values to calculate performance metrics, so comparing metrics calculated using the scaled vs. unscaled values would not be appropriate.