IBM / molformer

Repository for MolFormer
Apache License 2.0
244 stars 42 forks source link

Reproduction of results for ESOL and FreeSolv #9

Open subhalingamd opened 1 year ago

subhalingamd commented 1 year ago

Hi, thanks for releasing the pre-trained model and the code. Could you share the scripts used for fine-tuning on ESOL and FreeSolv data?

I am more interested in the hyper-parameters. I made the scripts similar to the Lipophilicity script but got way higher RMSE (e.g., more than 1 in case of FreeSolv).

Thanks.

GintasKam commented 1 year ago

could the RMSE's in the paper have been computed on the standardized values rather than the original ones?.. I think that was also the issue in another (BARTSmiles) llm paper that showed order-of-magnitude improvements in regression tasks.

for example, in the MolFormer repositories' data the lipophilicity values seem to be standardized (centered around 0 and all with ~10 decimal points) whereas the MoleculeNet datasets are in the 0-7 range and fewer decimal points. clarification around the regression datasets' treatment would be very appreciated!