Totally Can NOT Re-Produce!!!

A11en0 commented 1 year ago

Your code is incomplete, so I supplemented it myself.
Your paper doesn't mention the seed or hyperparameters for any dataset, making it impossible to reproduce the results as reported in the paper.
I attempted to reproduce the results by manually adjusting the parameters, but it was unsuccess to reach the results reported in the paper.

Overall, I have significant doubts about the truth of this paper. Coule you please give me an answer?

stefanhgm commented 1 year ago

Hello @A11en0,

thanks for reaching out and using our project.

I also answered this request in your other issue. As stated in the readme, we use the t-few project and provide all necessary files to run it for our setup in t-few. If there is anything else missing, please let us know the exact file? We added additional files to the project on user requests in the past.
The five different seeds are only provided in the code, they are 42, 1024, 0, 1, 32 (line 66 in few-shot-pretrained-100k.sh for the LLM and line 73 in evaluate_external_dataset.py). We did no parameter tuning for the LLM as stated in the paper. You can find the parameters we used for T0 in few-shot-pretrained-100k.sh and the config files configs. All hyperparameters of the baselines are in the appendix of the paper in section "3 PARAMETER TUNING FOR BASELINES". You can also find them in the code evaluate_external_dataset.py. We did an exhaustive search over all parameters.
It is probably very hard to find the right parameters by hand. Please redo the experiments with the seeds and parameters pointed out above (they are the default in our code) and check if you get the correct results. We reproduced the results from scratch using the code in this repository on a different machine, so we are very confident that it is possible.

If you have any further questions, please let us know!

YasHGoyaL27 commented 1 year ago

I am using the same code as yours with seed = 42, 32 shots, batch size of 4 and dataset being blood dataset with text template serialisations. I got 37.2 % accuracy compared to 67% as reported in the paper. Could you suggest parameters that I can change to get the desired results

stefanhgm commented 11 months ago

Hello @A11en0,

I hope I answered all of your questions. Please open a new issue if you have further problems.

@YasHGoyaL27: I copied your question into a new issue, since it is a more specific problem.

clinicalml / TabLLM

Totally Can NOT Re-Produce!!! #13