clinicalml / TabLLM

MIT License
271 stars 43 forks source link

Cannot reproduce performance on blood dataset with text template serialisations #15

Closed stefanhgm closed 11 months ago

stefanhgm commented 11 months ago

I am using the same code as yours with seed = 42, 32 shots, batch size of 4 and dataset being blood dataset with text template serialisations. I got 37.2 % accuracy compared to 67% as reported in the paper. Could you suggest parameters that I can change to get the desired results

Originally posted by @YasHGoyaL27 in https://github.com/clinicalml/TabLLM/issues/13#issuecomment-1744061868

stefanhgm commented 11 months ago

Hello @YasHGoyaL27,

thank you for using our code and please excuse the late reply!

I am sorry that you are unable to reproduce the performance. Could you please provide some additional details regarding your experiment to help me debug this problem:

  1. Could you provide your running configuration few-shot-pretrained-100k.sh?

  2. Can you please attach the output you get during your run?

  3. Can you check the serialized dataset, i.e. the text given to the LLM. Does it match the serialization schema?

  4. Did you check whether the "simpler" baselines like logistic regression achieve the performance stated in our paper? Maybe there is something wrong with the dataset?

Thank you!

stefanhgm commented 11 months ago

Hello @YasHGoyaL27,

just as a heads-up: based on the feedback in the issues, we updated the readme now with all steps to reproduce a performance entry from the paper. Maybe that is also helpful for you!

YasHGoyaL27 commented 11 months ago

Hello @stefanhgm Thank you for the reply I was able to reproduce the results reported in the paper. The issue was that I was using some different versions of libraries due to which the implementation of optimizer used in the code was coming out to be different. Did a change to the code to get that sorted.

Thank you for the great work!!

stefanhgm commented 11 months ago

Hello @YasHGoyaL27

Thank you very much for the positive feedback! Yes, installing the right library versions is often still problematic.