aryandeshwal / BODi

7 stars 2 forks source link

Handling different number of categorical values (tau_i) per categorical dimension #3

Open muddassar18 opened 9 months ago

muddassar18 commented 9 months ago

Hi Aryan, I am working on a research project dealing with a categorical optimization. I came across your paper and it is very interesting. I want to run it on some of problems with different number of possible categorical values per dimension.However, I noticed in your code on pest control branch, all categorical variable only takes ncatgs values. I am trying to generalize this so that categorical variables can potentially come from sets with different cardinalities (tau_i !=tau_j). Looking deeper into your code, I realized that I need to change the function generate_random_basis_vector(), where you only have one parameter ncatgs unlike Algorithm 5 in your paper wherein you consider different {tau_1…,tau_N}. Moreover, the python code for the function deviates from the pseudo code of Algorithm 5 quite a lot. I don’t see any obvious way to change the code to enable desired functionality. If you could help me understand how the code of the function works, I can make the desired change. Can you please help me figure this out? Thanks Muddassar

aryandeshwal commented 8 months ago

Hi Muddassar,

I just committed new changes in the pestcontrol branch of the code https://github.com/aryandeshwal/BODi/tree/pestcontrol to handle your problem. For your problem instance, one way is to convert the categorical problem to a binary problem via one-hot encoding and run BODi with binary variables. I added both the one-hot-encoding part in modeling and corresponding acquisition function optimization search that needs to search only over valid one-hot encodings. Please let me know if it helps. You can try the following snippet of code with different cardinalities to check if it works:

from bodi.run_experiment import run_experiment

category_sizes = [5] * 5 + [4] * 10 + [3] * 10

X, Y, metadata = run_experiment(
    n_replications=1,
    evalfn="PestControl",
    max_evals=100,
    n_initial_points=20,
    batch_size=1,
    n_binary=0,
    n_categorical=25,
    n_continuous=0,
    category_sizes=category_sizes,
    init_with_k_spaced_binary_sobol=False,
    n_prototype_vectors=64,
    verbose=True,
)

Thanks Aryan