Closed abcdhhhh closed 1 week ago
@abcdhhhh thanks for spotting this!
I reproduced your issue and pushed a fix. By default, the first action of a tour had to be the depot. But this did not work with multistart
since it was forcing other actions, and the depot could be visited later on since available
was True
. Now, by default, the depot is only considered during the reward calculation and not in the actions (see below). This can be modified by setting force_start_at_depot=True
%load_ext autoreload
%autoreload 2
import torch
from rl4co.envs import PDPEnv
from rl4co.models.zoo.ham import HeterogeneousAttentionModelPolicy
if __name__ == '__main__':
batch_size = 1
num_loc = 4
torch.manual_seed(2024)
env = PDPEnv(generator_params={"num_loc": num_loc}, check_solution=True, force_start_at_depot=False)
policy = HeterogeneousAttentionModelPolicy()
out = policy(td = env.reset(batch_size=[batch_size]),
env = env,
decode_type = "sampling", # note: need to set decode type
multistart = True,
)
print(out['actions'])
Out:
tensor([[1, 2, 4, 3],
[2, 4, 1, 3]])
Thanks for the update. It looks that the bug has been fixed.
Describe the bug
When using
PDPEnv
, I find solutions pass through depot, and start/end at other locations when usingmultistart
decoding strategy.This produces invalid solutions, but not checked when setting
check_solution=True
either.To Reproduce
output:
Both 2 solutions visit depot (0) between pickup (1) and delivery (3).
Additional context
Solutions normally start from depot when not using
multistart
.Reason and Possible fixes
Now
action_mask
chooses depot as first action, whileselect_start_nodes
chooses pickups.Letting solutions start from pickups and end at depot may solve this problem
Checklist