ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
432 stars 80 forks source link

Error when I test in vrplib #137

Closed linCC111 closed 7 months ago

linCC111 commented 7 months ago

Describe the bug

when I use a checkpoint test in a solomon dataset, I got a error(index out of bounds), and it seems happen in the step function in the env,the td["distances"] is (1, 21), but the action index is big than 20.

To Reproduce

import torch
import vrplib
from rl4co.models import AttentionModel
from einops import repeat
from rl4co.data.transforms import StateAugmentation as SymmetricStateAugmentation
import os

# device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device = torch.device('cpu')
checkpoint_path = "./lightning_logs/version_0/checkpoints/epoch=29-step=5880.ckpt"
data_path = "./data/C101.txt"

test_model = AttentionModel.load_from_checkpoint(checkpoint_path, load_baseline= False)

policy, env = test_model.policy, test_model.env
policy.to(device)

instance = vrplib.read_instance(data_path, instance_format= "solomon")

coords = torch.tensor(instance["node_coord"]).float()
demands = torch.tensor(instance["demand"][1:]).float()
time_window = torch.tensor(instance["time_window"]).float()
serve_time = torch.tensor(instance["service_time"]).float()
capacity = torch.tensor(instance["capacity"])
x, y = coords[:, 0], coords[:, 1]
b_time, e_time = time_window[:, 0], time_window[:, 1]
x_min, y_min = x.min(), y.min()
x_max, y_max = x.max(), y.max()
s_time_min, s_time_max = serve_time.min(), serve_time.max()
x_scaled = (x - x_min) / (x_max - x_min)
y_scaled = (y - y_min) / (y_max - y_min)
b_scaled = (b_time - b_time.min()) / (b_time.max() - b_time.min())
e_scaled = (e_time - e_time.min()) / (e_time.max() - e_time.min())
new_coords = torch.stack([x_scaled, y_scaled], dim= 1)
new_time_window = torch.stack([b_scaled, e_scaled], dim= 1)
serve_time = (serve_time - s_time_min) / (s_time_max - s_time_min)
n = new_coords.shape[0]

batch_size = 1
td = env.reset(batch_size= (batch_size, )).to(device)
td["locs"] = repeat(new_coords, 'n d -> b n d', b= batch_size, d= 2)
td["time_windows"] = repeat(new_time_window, 'n d -> b n d', b= batch_size, d= 2)
td["durations"] = repeat(serve_time, 'n -> b n', b= batch_size)
td["demand"] = repeat(demands, 'n -> b n', b= batch_size) / capacity
td["visited"] = torch.zeros((batch_size, 1, n), dtype= torch.uint8)
action_mask = torch.ones(batch_size, n, dtype= torch.bool)
action_mask[:, 0] = False
td["action_mask"] = action_mask

# num_augment = 100

# augmentation = SymmetricStateAugmentation(env.name, num_augment= num_augment)
# td = augmentation(td)

os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

with torch.no_grad():
    out = policy(td.clone(), decode_type= 'greedy', num_starts= 0, return_actions= True)

print(out)
Traceback (most recent call last):
  File "/home/lin/reinforce/test.py", line 58, in <module>
    out = policy(td.clone(), decode_type= 'greedy', num_starts= 0, return_actions= True)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/rl4co/models/zoo/common/autoregressive/policy.py", line 153, in forward
    log_p, actions, td_out = self.decoder(td, embeddings, env, **decoder_kwargs)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/rl4co/models/zoo/common/autoregressive/decoder.py", line 194, in forward
    td = env.step(td)["next"]
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/rl4co/envs/common/base.py", line 102, in step
    td = self._step(td)
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/rl4co/envs/routing/mocvrptw.py", line 199, in _step
    distance = gather_by_index(td["distances"], td["action"]).reshape([batch_size, 1])
  File "/root/anaconda3/envs/reinforce/lib/python3.10/site-packages/rl4co/utils/ops.py", line 75, in gather_by_index
    return src.gather(dim, idx).squeeze() if squeeze else src.gather(dim, idx)
RuntimeError: index 98 is out of bounds for dimension 1 with size 21

Reason

I had read the decoder.py and mocvrptw.py(my own env base on cvrptw.py), and found that the td["distances"] updates in the get_action_mask() in cvrptw.py.But in the step() in cvrptw.py ,it will go to the step() in cvrp.py and will run the get_action_mask() in cvrp.py.I think it causes the error.

fedebotu commented 7 months ago

Hi, here are some possible fixes:

  1. We recently implemented loading from Solomon instances here, could you have a look at it first?

  2. The error

    RuntimeError: index 98 is out of bounds for dimension 1 with size 21

    means that there is some problem in some size, most probably a variable in the td

  3. get_action_mask() in cvrptw.py.But in the step() in cvrptw.py ,it will go to the step() in cvrp.py and will run the get_action_mask() in cvrp.py

Yes, the action mask in CVRPTW is a composition of the TW and the base CVRP masking procedure by design - if you want to remove some constraints, you may need to refactor the code

  1. What modifications did you make for the mocvrptw?
fedebotu commented 7 months ago

PS: I just tried running the code (for the CVRPTW we have), updating instances worked:

from rl4co.utils.ops import gather_by_index, get_distance   

current_loc = gather_by_index(td["locs"], td["current_node"]).reshape(
    [batch_size, 2]
)
dist = get_distance(current_loc, td["locs"].transpose(0, 1)).transpose(0, 1)
td["distances"] = dist

With the policy now working but obtaining:

AssertionError: Logits contain NaNs

which means that usually, the action_mask contains all "False" for some instance (i.e. nodes cannot be visited)

linCC111 commented 7 months ago

1.How can I use the extract_from_solomon().I try "python":

batch_size = 1
instance = vrplib.read_instance(data_path, instance_format= "solomon")
policy, env = test_model.policy, test_model.env
env.extract_from_solomon(instance, batch_size)

but falied 2.I change the reward in cvrptw.py trying to make it a multi-objective problem.And the action_mask constraints doesn't need change.

fedebotu commented 7 months ago
  1. Running this
    batch_size = 1
    instance = vrplib.read_instance(data_path, instance_format= "solomon")
    policy, env = test_model.policy, test_model.env
    td = env.extract_from_solomon(instance, batch_size)
    print(td)

works fine for me:

TensorDict(
    fields={
        action_mask: Tensor(shape=torch.Size([1, 101]), device=cpu, dtype=torch.bool, is_shared=False),
        current_loc: Tensor(shape=torch.Size([1, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        current_node: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        current_time: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        demand: Tensor(shape=torch.Size([1, 100]), device=cpu, dtype=torch.float32, is_shared=False),
        depot: Tensor(shape=torch.Size([1, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        distances: Tensor(shape=torch.Size([1, 101]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        durations: Tensor(shape=torch.Size([1, 101]), device=cpu, dtype=torch.int64, is_shared=False),
        locs: Tensor(shape=torch.Size([1, 101, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        time_windows: Tensor(shape=torch.Size([1, 101, 2]), device=cpu, dtype=torch.int64, is_shared=False),
        used_capacity: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        vehicle_capacity: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        visited: Tensor(shape=torch.Size([1, 1, 101]), device=cpu, dtype=torch.uint8, is_shared=False)},
    batch_size=torch.Size([1]),
    device=None,
    is_shared=False)

However, when running the policy, there seems to be an infinite loop with this minimalistic code:

 %load_ext autoreload
%autoreload 2

import torch
import vrplib
from rl4co.models import AttentionModelPolicy
from rl4co.envs import CVRPTWEnv

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
data_path = "./data/C101.txt"

env = CVRPTWEnv()
policy = AttentionModelPolicy(env).to(device)

# Use Solomon instance
vrplib.download_instance("C101", "data")
instance = vrplib.read_instance(data_path, instance_format= "solomon")
td = env.extract_from_solomon(instance, 1).to(device)

## Use randomly generated instance
# td = env.reset(batch_size=(2,)).to(device)

# Inference on policy
with torch.inference_mode():
    out = policy(td.clone(), decode_type= 'greedy', num_starts= 0, return_actions= True)

gets an infinite loop. If you uncomment and use the randomly generated instance, then it works fine. I guess it is a small masking problem? We will have a look at this. In the meantime: we recommend using a similar structure as this notebook!

2. I see!
linCC111 commented 7 months ago

I run the code without randomly generated instance and when go to the decoder,the log_p get tensor([[0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf]], device='cuda:0') It seems the model cannot generate true result with the input data extract from solomon

fedebotu commented 7 months ago

Yes, that happens because there is some error in the masking procedure - I think it's due to some mistakenly "unreachable" node, we will have a look at it :)

linCC111 commented 7 months ago

1.I have solved the problem,by modifying my initial code according to the notebook you provide.I add the td["distances"] which means the distances between the depot and other nodes at the first step, as the initial td["distances"].The error(index out of bounds) doesn't occur. td["distances"] = get_distance(td["locs"][:, 0, :], td["locs"][:, 1:, :].transpose(0, 1)).transpose(0, 1) The complete data which need initialize as follow:

batch_size = 1
td = env.reset(batch_size= (batch_size, )).to(device)
td["locs"] = repeat(coords, 'n d -> b n d', b= batch_size, d= 2)
td["distances"] = get_distance(td["locs"][:, 0, :], td["locs"][:, 1:, :].transpose(0, 1)).transpose(0, 1)
td["time_windows"] = repeat(time_window, 'n d -> b n d', b= batch_size, d= 2)
td["durations"] = repeat(serve_time, 'n -> b n', b= batch_size)
td["demand"] = repeat(demands, 'n -> b n', b= batch_size) / capacity
td["visited"] = torch.zeros((batch_size, 1, n), dtype= torch.uint8)
action_mask = torch.ones(batch_size, n, dtype= torch.bool)
action_mask[:, 0] = False
td["action_mask"] = action_mask
print(td)

2.The extract_from_solomon() still work failed.

ngastzepeda commented 7 months ago

Regarding 2: are you using the latest version of the main branch?

It seems the model cannot generate true result with the input data extract from solomon

This is indeed an issue we have encountered: when verifying the optimal solutions provided by cvrplib using the check_solution_validity() function, our validity checks return invalid routes for many of the instances. This is due to how float point calculations are handled and can lead to small deviations in the calculated distances. These small calculations can lead to our validity checks asserting that a node cannot be reached in time (by a small margin, of course), even though it is part of the best route. If this is an issue for the optimal routes, it would surely be an issue during training the same instances.

We do not have a solution implemented yet to handle these small discrepancies, but we have a few ideas - still working on it, though.

linCC111 commented 7 months ago

I used the 0.3.1 at first and test the 0.4.0(it ought to be the latest version?) just now.

Regarding 2: are you using the latest version of the main branch?

The same problem occurs.

the running procedure gets an infinite loop.

cbhua commented 7 months ago

Hi @huanxulin1, we found the bug and fixed it in 5e90b77. The reason is that in the previous cvrp environment get_action_mask() function, the vehicle_capacity is separately passed instead of using the variable inside the tensordict. This causes all nodes to be wrongly masked because of the capacity constraint.

Now you could use the following minimum code to check the running:

import torch
import vrplib
from rl4co.envs import CVRPTWEnv
from rl4co.models import AttentionModelPolicy

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
data_path = "./data/C101.txt"

env = CVRPTWEnv()
policy = AttentionModelPolicy(env).to(device)

# Use Solomon instance
vrplib.download_instance("C101", "data")
instance = vrplib.read_instance(data_path, instance_format= "solomon")
td = env.extract_from_solomon(instance, 1).to(device)

# Inference on policy
with torch.inference_mode():
    out = policy(td.clone(), decode_type= 'greedy', num_starts= 0, return_actions= True)
    print(out)

Here is the output from my side with an untrained AM just for your reference:

{'reward': tensor([-4923.0938], device='cuda:2'), 'log_likelihood': tensor([-226.6587], device='cuda:2'), 'actions': tensor([[ 75,   0,  47,   0,   1,   0,  ..., 0,  87]], device='cuda:2')}

And about your another concern:

I run the code without randomly generated instance and when go to the decoder,the log_p get tensor([[0., -inf, -inf, ..., -inf]], device='cuda:0') It seems the model cannot generate true result with the input data extract from solomon

Don't worry, this is correct, in our design, the log_p would be -inf if the node is masked in this step. So the log_p you saw means that only the depot is available.