Closed TheMatrixMaster closed 5 months ago
Hi @TheMatrixMaster,
Thanks for your interest in this work. Training from scratch is discouraged in favor of finetuning with gt4sd-trainer
as described in the GT4SD README: https://github.com/GT4SD/gt4sd-core/tree/main/examples/regression_transformer
Please use GT4SD because it uses an improved version of the RT which is exposed in this repo only in the gt4sd
branch but not in main
.
Since you are working on protein sequences, for your finetuning, please start from the stability
model, as explained in the GT4SD example linked above. That model was pretrained on few million protein sequences with a synthetic property (Boman index) and then finetuned on the dataset used in the TAPE paper on predicting protein stability. See the RT paper for details.
About hardware: Single-GPU usage should be fine, as the code is likely not out-of-the-box-compatible with multi-GPU. Please be aware that the training is not super fast due to teh XLNet backbone.
Parameter: Which ones are you looking for an intuition? In general, please read docstring in GT4SD: https://github.com/GT4SD/gt4sd-core/blob/daae05b8846563501c4a10245ec3bfa7c1982e47/src/gt4sd/training_pipelines/regression_transformer/core.py#L40
Closing as completed but feel free to reopen/comment in case of more questions
Hi, Thanks for sharing this great work! I'm trying to train from scratch on a dataset of around ~900k protein sequences and I am having some trouble with getting an intuition about what hyperparameters are reasonable to use. Is the provided example config a good place to start? I couldn't find very detailed information about which set of hyperparameters you used to train models in the paper. I'm currently using the below configs:
and training with the following flags:
Could you also provide some recommendations on what hardware to use? Thanks!