Question regarding MCTS

elgeneee commented 3 years ago

Hi NeverOnTimeSdnBhd, There are 2 questions I am unsure of and require your explanation,

Is it okay to use level = 3, iterations = 100 in each sample test case?
What does the 3D array of policy used to store? Is it used to store the global_policy in each level? Or is it used to store integer/double?

Appreciate your help, thank you!

NeverOnTimeSdnBhd commented 3 years ago

Hi

yes, but if you feel that MCTS simulation is taking a long time for small N and you are lazy to wait you definitely can reduce the iterations or level for small N samples
it is used to store every level global_policy

elgeneee commented 3 years ago

Hi @NeverOnTimeSdnBhd,

Line 96: probability[i] = Math.exp(globalPolicy[currentStop][possible_successors[i]]) May I know is "possibe_successors" supposed to contain Customer 1 to the last Customer? Or include the depot in front?

Thanks again!

NeverOnTimeSdnBhd commented 3 years ago

Hi

It should include depot as well

elgeneee commented 3 years ago

Hi, May I know the 3D array for policy you have explained in the NRPA is an Integer data type or Double data type of 3D array?

elgeneee commented 3 years ago

Hi @NeverOnTimeSdnBhd, At Line 25 when you call the return rollout() , does it not provide you any error? Because rollout() returns nothing, but your search() returns a tour, the return type is not matched. Sorry if I miss out anything.

Thank you!

NeverOnTimeSdnBhd commented 3 years ago

Hi, May I know the 3D array for policy you have explained in the NRPA is an Integer data type or Double data type of 3D array?

double, as you can guess from line 57

NeverOnTimeSdnBhd commented 3 years ago

Hi @NeverOnTimeSdnBhd, At Line 25 when you call the return rollout() , does it not provide you any error? Because rollout() returns nothing, but your search() returns a tour, the return type is not matched. Sorry if I miss out anything.

Thank you!

I did a mistake there, sorry and thank you for pointing out, I am updating the pseudocode now

elgeneee commented 3 years ago

Hi @NeverOnTimeSdnBhd,

Starting at Line 52, as the pseudocode says

for every possible move that can be made by stop
   if the move is not visited yet
      ...

My question is how do you get the number of possible move? Is it from other function or anything?

Thank you for reading this🙏🏻

NeverOnTimeSdnBhd commented 3 years ago

Hi

Please refer to https://github.com/NeverOnTimeSdnBhd/Delivery-Instances/issues/2, I think I answered something related in 17th comment (current last 3).

You can get the possible moves by creating a new function or just directly in the function requires getting the possible move, it is up to you.

DragonKnightMax commented 3 years ago

Good Day @NeverOnTimeSdnBhd

For the adapt() in pseudocode.txt, since we are iterating through each stop in each route, do we need to mark the depot as visited only once for the first route? because later we need to visit the depot again for the next route. Will marking the depot as visited have an effect on the outcome of the best tour?

function adapt (a_tour, level) {
for every route in a_tour
    for every stop in route
        policy[level][stop][next_stop] += ALPHA
        z = 0.0
        for every possible move that can be made by stop
            if the move is not visited yet
                z += Math.exp(globalPolicy[stop][move]);
        for every possible move that can be made by stop
            if the move is not visited yet
                policy[level][stop][move] -= ALPHA * (Math.exp(globalPolicy[stop][move]) / z)
        set stop as visited  <- this one
}

I manage to get an output for the best tour but the searching time is less than 5 seconds even for the largest test cases but it is not the optimum one. As I saw in the sample output, you take more than 60 seconds which is exceeding the time limit in order to search for the best tour. Does it have something to do with the configuration parameter level, iterations and ALPH? If so, then how we should configure and play with the parameters to maximize the search performance?
Lastly, can we make use of modules written in other programming languages such as Python for the graph visualization?

Thanks in advance!

NeverOnTimeSdnBhd commented 3 years ago

Hi

Sorry for missing out your questions, luckily you remind me in the announcement (sweat

depot is always visited first when we doing the calculation so you don't have to reset the visited state for depot
yes, those hyperparameters does affect the result you get. Higher level and iterations will result in longer searching but better result (tour with lower cost). Just imagine it like a human is searching a route, if you give him more time, he could have search for a better route because he can try more possibilities.
Haha, yes you can do that if you import it into your Java program (import a library written in Python into Java program). If you mean write another Python program to visualize the graph then unfortunately I think most of the lecturers won't accept it.

DragonKnightMax commented 3 years ago

@NeverOnTimeSdnBhd

Sorry for missing out your questions, luckily you remind me in the announcement (sweat

It's ok, thanks for your reply xD

depot is always visited first when we doing the calculation so you don't have to reset the visited state for depot

Ohh, is it something like this if I'm not misunderstood?

Adapting...

Route 1:
0 (set as visited) -> 3 (set as visited) -> 1 (set as visited) -> 0 (already visited)

Route 2:
0 (already visited) -> 4 (set as visited) -> 0 (already visited)

Again, thanks in advance

NeverOnTimeSdnBhd commented 3 years ago

Yes, exactly like the sample you give!

NeverOnTimeSdnBhd / Delivery-Instances

Question regarding MCTS #6