ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
451 stars 83 forks source link

[BUG] PDP solutions pass through depot when using `multistart` decoding #231

Closed abcdhhhh closed 1 week ago

abcdhhhh commented 1 week ago

Describe the bug

When using PDPEnv, I find solutions pass through depot, and start/end at other locations when using multistart decoding strategy.

This produces invalid solutions, but not checked when setting check_solution=True either.

To Reproduce

import torch
from rl4co.envs import PDPEnv
from rl4co.models.zoo.ham import HeterogeneousAttentionModelPolicy

if __name__ == '__main__':
    batch_size = 1
    num_loc = 4
    torch.manual_seed(2024)

    env = PDPEnv(generator_params={"num_loc": num_loc}, check_solution=True)
    policy = HeterogeneousAttentionModelPolicy(
        test_decode_type = "sampling",
    )
    out = policy(
        td = env.reset(batch_size=[batch_size]),
        env = env,
        phase = "test",
        multistart = True,
    )

    print(out['actions'])

output:

tensor([[1, 0, 2, 4, 3],
        [2, 4, 1, 0, 3]])

Both 2 solutions visit depot (0) between pickup (1) and delivery (3).

Additional context

Solutions normally start from depot when not using multistart.

Reason and Possible fixes

Now action_mask chooses depot as first action, while select_start_nodes chooses pickups.

Letting solutions start from pickups and end at depot may solve this problem

Checklist

fedebotu commented 1 week ago

@abcdhhhh thanks for spotting this!

I reproduced your issue and pushed a fix. By default, the first action of a tour had to be the depot. But this did not work with multistart since it was forcing other actions, and the depot could be visited later on since available was True. Now, by default, the depot is only considered during the reward calculation and not in the actions (see below). This can be modified by setting force_start_at_depot=True

%load_ext autoreload
%autoreload 2

import torch
from rl4co.envs import PDPEnv
from rl4co.models.zoo.ham import HeterogeneousAttentionModelPolicy

if __name__ == '__main__':
    batch_size = 1
    num_loc = 4
    torch.manual_seed(2024)

    env = PDPEnv(generator_params={"num_loc": num_loc}, check_solution=True, force_start_at_depot=False)
    policy = HeterogeneousAttentionModelPolicy()
    out = policy(td = env.reset(batch_size=[batch_size]),
        env = env,
        decode_type = "sampling", # note: need to set decode type
        multistart = True,
    )

    print(out['actions'])

Out:

 tensor([[1, 2, 4, 3],
        [2, 4, 1, 3]])
abcdhhhh commented 1 week ago

Thanks for the update. It looks that the bug has been fixed.