Open subhalingamd opened 1 year ago
could the RMSE's in the paper have been computed on the standardized values rather than the original ones?.. I think that was also the issue in another (BARTSmiles) llm paper that showed order-of-magnitude improvements in regression tasks.
for example, in the MolFormer repositories' data the lipophilicity values seem to be standardized (centered around 0 and all with ~10 decimal points) whereas the MoleculeNet datasets are in the 0-7 range and fewer decimal points. clarification around the regression datasets' treatment would be very appreciated!
Hi, thanks for releasing the pre-trained model and the code. Could you share the scripts used for fine-tuning on ESOL and FreeSolv data?
I am more interested in the hyper-parameters. I made the scripts similar to the Lipophilicity script but got way higher RMSE (e.g., more than 1 in case of FreeSolv).
Thanks.