Some doubts about TABPFN Training in Section E.3 of the paper

liuquangao commented 8 months ago

Hello： I have recently had the opportunity to delve into both the code and paper you have published, specifically focusing on the details provided in section E.3 regarding the training of the final model.

In section E.3, it is mentioned that the model was trained for 18,000 steps with a batch size of 512 datasets, totaling 9,216,000 synthetically generated datasets.

However, while reviewing the PriorFittingCustomPrior.ipynb, I noticed a difference in the configuration settings, which suggests a total of 26,214,400 synthetically generated datasets based on the provided batch size and number of steps (51200*512)

config['aggregate_k_gradients'] = 8
config['batch_size'] = 8*config['aggregate_k_gradients']
config['num_steps'] = 1024//config['aggregate_k_gradients']
config['epochs'] = 400

num_steps = 128 steps:128400=51200 batchsize:648(gpus)=512 synthetically generated datasets：51200*512=26214400

Would it be possible for you to share the precise training settings used for your final model? Understanding the exact parameters would greatly aid in aligning my research for a fair and meaningful comparison.

Best regards, liuquangao

liuquangao commented 8 months ago

If I've got anything wrong about how I've understood your training setup or methods, I'd really be thankful if you could set me straight.

SamuelGabriel commented 7 months ago

I think, this might be a confusion that: i) we use 8 GPU training and the batch_size parameter is per GPU, ii) our aggregate_k_gradients aggregates seperate batches in our repo logic, which are the same batch in the optimizer logic, which we write about, though.

Does that help?

liuquangao commented 7 months ago

Thank you very much. Now I get it.

automl / TabPFN

Some doubts about TABPFN Training in Section E.3 of the paper #84