aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.71k stars 503 forks source link

Unable to achieve the expected training accuracy #280

Open llwx593 opened 1 year ago

llwx593 commented 1 year ago

Hi, thank you very much for your work. Now I want to training openfold, and download the corresponding dataset(RODA dataset), and then use the following script for training. python3 train_openfold.py /oss/pdb_mmcif/mmcif_files/ /oss/flatten_data/alignments/ /oss/pdb_mmcif/mmcif_files/ 23_2_25_output_dir/ 2021-10-10 \ --template_release_dates_cache_path mmcif_cache.json \ --precision bf16 \ --gpus 8 \ --replace_sampler_ddp=True \ --seed 42 \ --deepspeed_config_path deepspeed_config.json \ --checkpoint_every_epoch \ --resume_model_weights_only True \ --train_chain_data_cache_path chain_data_cache.json \ --obsolete_pdbs_file_path /oss/pdb_mmcif/obsolete.dat I did not modify the code of openfold. And The dataset used you provided in the RODA, but did not use the self-distillation dataset (in the paper, this step does not seem to have a great impact on the accuracy). I conducted 8 gpus training. But I find that the "lddtca" is unable to improve. I want to ask what is the possible reason. image

jonathanking commented 1 year ago

I happened to notice your issue, and it looks similar to #196. Maybe this training run has found the lower of the two training performance modes?