Reproduction of results for ESOL and FreeSolv

IBM / molformer

Repository for MolFormer

Apache License 2.0

266 stars 43 forks source link

could the RMSE's in the paper have been computed on the standardized values rather than the original ones?.. I think that was also the issue in another (BARTSmiles) llm paper that showed order-of-magnitude improvements in regression tasks.

for example, in the MolFormer repositories' data the lipophilicity values seem to be standardized (centered around 0 and all with ~10 decimal points) whereas the MoleculeNet datasets are in the 0-7 range and fewer decimal points. clarification around the regression datasets' treatment would be very appreciated!

IBM / molformer

Reproduction of results for ESOL and FreeSolv #9