This repository contains the code for ActFound (now published on Nature Machine Intelligence): A bioactivity foundation model using pairwise meta-learning.
⭐ We have now added a ready-to-use online programmatic tool on Colab.
😊 You can easily use Colab tools to fine-tune Actfound with few measured compounds and then use it to give prediction on your unmeasured compounds.
❤️ We also provide a metric in this colab to help you predict if Actfound can works well on your data with only few measured compounds for fine-tuning. (See Figure 3.g.h of our paper for more detailes)
Colab for Actfound
Colab for Actfound with KNN-MAML and fusion method
Colab for finetuning full parameters of Actfound (can be more suitable when you have many measured compounds for finetuning, e.g. more than 100)
Compound bioactivity plays an important role in drug development and discovery. Existing machine learning approaches have poor generalization ability in compound bioactivity prediction due to the small number of compounds in each assay and incompatible measurements among assays. In this paper, we propose ActFound, a bioactivity foundation model trained on 1.6 million experimentally measured bioactivities and 35,644 assays from ChEMBL. The key idea of ActFound is to employ pairwise learning to learn the relative value differences between two compounds within the same assay to circumvent the incompatibility among assays. ActFound further exploits meta-learning to jointly optimize the model from all assays. On six real-world bioactivity datasets, ActFound demonstrates accurate in-domain prediction and strong generalization across datasets, assay types, and molecular scaffolds. We also demonstrated that ActFound can be used as an accurate alternative to the leading physics-based computational tool FEP+(OPLS4) by achieving comparable performance when only using a few data points for fine-tuning. The promising results of ActFound indicate that ActFound can be an effective bioactivity foundation model for a wide range of tasks in compound bioactivity prediction, paving the path for machine learning-based drug development and discovery.
All data, model checkpoints, and test results are available on the Google Drive.
Experiments on ChEMBL and BindingDB datasets:
FIXED_PARAM="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1"
CHEMBL_KNN_MAML="--knn_maml --train_assay_feat_all ./train_assay_feat/chembl/feat.npy --train_assay_idxes ./train_assay_feat/chembl/index.json" CHEMBL_DIR="./checkpoints_all/checkpoints_chembl" CHEMBL_RES="./test_results/result_indomain/chembl" python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} ${CHEMBL_KNN_MAML} &
BDB_KNN_MAML="--knn_maml --train_assay_feat_all ./train_assay_feat/bdb/feat.npy --train_assay_idxes ./train_assay_feat/bdb/index.json" BDB_DIR="./checkpoints_all/checkpoints_bdb" BDB_RES="./test_results/result_indomain/bdb" python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES} ${FIXED_PARAM} ${BDB_KNN_MAML} &
+ **'model_name'**: Specify the method to run, choose between actfound, actfound_transfer, maml, protonet and transfer_qsar
+ **'test_sup_num'**: Specify the size of support set
Experiments on pQSAR-ChEMBL:
```bash
FIXED_PARAM_PQSAR="--test_sup_num 0.75 --test_repeat_num 1 --train 0 --test_epoch -1"
PQSAR_KNN_MAML="--knn_maml --train_assay_feat_all ./train_assay_feat/pqsar/feat.npy --train_assay_idxes ./train_assay_feat/pqsar/index.json"
PQSAR_DIR="./checkpoints_all/checkpoints_pqsar"
PQSAR_RES="./test_results/result_indomain/pqsar"
python main_reg.py --datasource=pqsar --logdir ${PQSAR_DIR}/checkpoint_pqsar_actfound --model_name actfound --test_write_file ${PQSAR_RES} ${FIXED_PARAM_PQSAR} ${PQSAR_KNN_MAML} &
Experiments on FS-MOL:
FIXED_PARAM_FSMOL="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1"
FSMOL_KNN_MAML="--knn_maml --train_assay_feat_all ./train_assay_feat/fsmol/feat.npy --train_assay_idxes ./train_assay_feat/fsmol/index.json"
FSMOL_DIR="./checkpoints_all/checkpoints_fsmol"
FSMOL_RES="./test_results/result_indomain/fsmol"
python main_reg.py --datasource=kiba --logdir ${FSMOL_DIR}/checkpoint_fsmol_actfound --model_name actfound --test_write_file ${FSMOL_RES} ${FIXED_PARAM_FSMOL} ${FSMOL_KNN_MAML} &
Experiments on ChEMBL-Activity:
FIXED_PARAM="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1 --expert_test ood"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES="./test_results/result_ood"
CHEMBL_RES_TMP="./test_results/result_ood_tmp"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file "${CHEMBL_RES_TMP}/normal" ${FIXED_PARAM}
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file "${CHEMBL_RES_TMP}/inverse" ${FIXED_PARAM} --inverse_ylabel
python ./combine_inverse_prediction.py
ChEMBL to BindingDB:
FIXED_PARAM="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1 --cross_test"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES="./test_results/result_cross/chembl2bdb"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} &
BindingDB to ChEMBL:
FIXED_PARAM="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1 --cross_test"
BDB_DIR="./checkpoints_all/checkpoints_bdb"
BDB_RES="./test_results/result_cross/bdb2chembl"
python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES} ${FIXED_PARAM} &
ChEMBL to DAVIS and BindingDB to DAVIS:
FIXED_PARAM_DAVIS="--test_sup_num 16 --test_repeat_num 10 --train 0 --test_epoch -1 --expert_test davis"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES_DAVIS="./test_results/result_cross/chembl2davis"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES_DAVIS} ${FIXED_PARAM_DAVIS} &
BDB_DIR="./checkpoints_all/checkpoints_bdb"
BDB_RES_DAVIS="./test_results/result_cross/bdb2davis"
python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES_DAVIS} ${FIXED_PARAM_DAVIS} &
ChEMBL to DAVIS and BindingDB to DAVIS:
FIXED_PARAM_KIBA="--test_sup_num [16,32,64,128] --test_repeat_num 10 --train 0 --test_epoch -1 --expert_test kiba"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES_KIBA="./test_results/result_cross/chembl2kiba"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES_KIBA} ${FIXED_PARAM_KIBA} &
BDB_DIR="./checkpoints_all/checkpoints_bdb"
BDB_RES_KIBA="./test_results/result_cross/bdb2kiba"
python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES_KIBA} ${FIXED_PARAM_KIBA} &
Experiment bioactivity as the fine-tuning set:
FIXED_PARAM="--test_sup_num [0.2,0.4,0.6,0.8] --test_repeat_num 40 --train 0 --test_epoch -1 --expert_test fep"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES="./test_results/result_fep_new/fep/chembl"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} &
BDB_DIR="./checkpoints_all/checkpoints_bdb"
BDB_RES="./test_results/result_fep_new/fep/bdb"
python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES} ${FIXED_PARAM} ${BDB_KNN_MAML} &
FEP+(OPLS4) calculation results as the fine-tuning set:
FIXED_PARAM="--test_sup_num [0.2,0.4,0.6,0.8] --test_repeat_num 40 --train 0 --test_epoch -1 --expert_test fep_opls4"
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl"
CHEMBL_RES="./test_results/result_fep_new/fep_opls4/chembl"
python main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} &
BDB_DIR="./checkpoints_all/checkpoints_bdb"
BDB_RES="./test_results/result_fep_new/fep_opls4/bdb"
python main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES} ${FIXED_PARAM} ${BDB_KNN_MAML} &
FIXED_PARAM="--model_name actfound --test_write_file test_results/gdsc --test_repeat_num 1 --metatrain_iterations 80 --input_celline"
python main_reg.py --datasource=gdsc ${FIXED_PARAM} --logdir ./checkpoints_all/checkpoints_chembl/checkpoint_chembl_actfound --resume 1 --test_epoch -1 --gdsc_pretrain_data chembl
python main_reg.py --datasource=gdsc ${FIXED_PARAM} --logdir ./checkpoints_all/checkpoints_bdb/checkpoint_bdb_actfound --resume 1 --test_epoch -1 --gdsc_pretrain_data bdb
python main_reg.py --datasource=gdsc ${FIXED_PARAM}
We provide all test results on the file test_results_all.tar.gz, which can be used to plot all figures shown in our paper. we also provide all scripts we used to run the full test in "./script". You can refer to the correct running script contained in "./script"
For ActFound training, please first make sure that the training data is correctly downloaded, and then simply run "main_reg.py". Training of ActFound on ChEMBL takes roughly 70 hours, and training on BindingDB takes roughly 30 hours.
FIXED_PARAM="--test_sup_num 16 --test_repeat_num 2 --begin_lrloss_epoch 50 --metatrain_iterations 80 --no_fep_lig "
CHEMBL_DIR="./checkpoints_all/checkpoints_chembl_nofep"
CHEMBL_RES="./test_results/result_indomain_nofep/chembl"
python -u main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound --model_name actfound --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} > ./runlog_nofep/chembl_actfound.log &
python -u main_reg.py --datasource=chembl --logdir ${CHEMBL_DIR}/checkpoint_chembl_actfound_transfer --model_name actfound_transfer --test_write_file ${CHEMBL_RES} ${FIXED_PARAM} > ./runlog_nofep_new/chembl_actfound_transfer.log &
BDB_DIR="./checkpoints_all/checkpoints_bdb_nofep"
BDB_RES="./test_results/result_indomain_nofep/bdb"
python -u main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound --model_name actfound --test_write_file ${BDB_RES} ${FIXED_PARAM} > ./runlog_nofep/bdb_actfound.log &
python -u main_reg.py --datasource=bdb --logdir ${BDB_DIR}/checkpoint_bdb_actfound_transfer --model_name actfound_transfer --test_write_file ${BDB_RES} ${FIXED_PARAM} > ./runlog_nofep_new/bdb_actfound_transfer.log &
Please feel free to contact me by email if there is any problem with the code or paper: fengbin14@pku.edu.cn.
If you use ActFound in your work, a citation to our paper is appreciated: