MJ10 / BioSeq-GFN-AL

Code for "Biological Sequence Design with GFlowNets", 2022
MIT License
72 stars 16 forks source link

How to reproduce the GFP results? #4

Open dlnp2 opened 2 years ago

dlnp2 commented 2 years ago

Hi @MJ10 I am very interested in your work and trying to reproduce the results, specifically for GFP (Table 3). Because some errors appeared when I ran run_gfp.py with the default parameters, I tried to change some of them to be the same as described in B.2. in the paper, but in turn found that the results were quite poor (avg. iter_1/collected_seqs_scores was about 1.42, far worse than the reported performance). Could you tell the exact parameters used for the experiments reported in the paper? The followings are the parameters I tried. Thank you.

acq_fn none
enable_tensorboard False
filter False
gen_L2 0
gen_Z_learning_rate 0.001
gen_balanced_loss 1
gen_clip 10
gen_data_sample_per_step 32
gen_do_explicit_Z 0
gen_do_pg 0
gen_episodes_per_step 16
gen_leaf_coef 25
gen_learning_rate 0.0005
gen_loss_eps 1e-05
gen_max_len 237
gen_model_type mlp
gen_num_hidden 2048
gen_num_iterations 20000
gen_output_coef 10
gen_partition_init 50
gen_pg_entropy_coef 0.01
gen_random_action_prob 0.05
gen_reward_exp 2
gen_reward_exp_ramping 3
gen_reward_min 0
gen_reward_norm 1
gen_sampling_temperature 2.0
kappa 0.1
load_proxy_weights None
load_scores_path .
max_len 237
max_percentile 80
name test_mlp
num_rounds 1
num_sampled_per_round 128
oracle_features AlBert
oracle_split D2_target
oracle_type MLP
proxy_L2 0.0001
proxy_arch mlp
proxy_data_split D1
proxy_dropout 0.1
proxy_early_stop_to_best_params 0
proxy_early_stop_tol 5
proxy_learning_rate 0.0001
proxy_num_dropout_samples 25
proxy_num_hid 2048
proxy_num_iterations 10000
proxy_num_layers 2
proxy_num_per_minibatch 256
proxy_type regression
proxy_uncertainty dropout
run -1
save_path ./outputs/run_gfp/test_mlp.pkl.gz
save_proxy_weights False
save_scores False
save_scores_path .
seed 0
task gfp
tb_log_dir .//outputs/run_gfp/test_mlp
use_uncertainty False
vocab_size 20