Issue with Generating and Using the Model After Training with LPAC Network

wahaha-1 commented 3 months ago

I encountered an issue when trying to generate and use a model instance after training according to the LPAC network. After training, a model instance is generated, but the reference model provided is a dictionary containing model parameters. I attempted to modify the code to generate the same file type, but it still didn't work.

Below are the details of the modification method I used and the resulting error code.

Error Message:

Traceback (most recent call last):
  File "/home/wahaha/Desktop/CoverageControl/python/scripts/evaluators/eval.py", line 143, in <module>
    evaluator.evaluate()
  File "/home/wahaha/Desktop/CoverageControl/python/scripts/evaluators/eval.py", line 84, in evaluate
    controller = Controller(
  File "/home/wahaha/.local/lib/python3.10/site-packages/coverage_control/algorithms/controllers.py", line 122, in __init__
    self.model.load_model(IOUtils.sanitize_path(self.config["ModelStateDict"]))
  File "/home/wahaha/.local/lib/python3.10/site-packages/coverage_control/nn/models/lpac.py", line 87, in load_model
    self.load_state_dict(torch.load(model_state_dict_path), strict=False)
  File "/home/wahaha/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2140, in load_state_dict
    raise TypeError(f"Expected state_dict to be dict-like, got {type(state_dict)}.")
TypeError: Expected state_dict to be dict-like, got <class 'coverage_control.nn.models.lpac.LPAC'>

Modified Training Script (train_lpac.py):


    model_file,
    optimizer_file,
)

trainer.train()

# Save the model's state dictionary
torch.save(model.state_dict(), model_file)

# Save the optimizer's state dictionary
torch.save(optimizer.state_dict(), optimizer_file)

# test_dataset = CNNGNNDataset(data_dir, "test", use_comm_map, world_size)
# test_loader = torch_geometric.loader.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=24)
# test_loss = trainer.Test(test_loader)
# print(f"Test loss: {test_loss}")

Modified Loading Script (eval.py):

import torch
from coverage_control.nn import LPAC
from coverage_control import IOUtils

# Load configuration file
config_file = sys.argv[1]
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

config = IOUtils.load_toml(config_file)
model = LPAC(config).to(device)

# Load the model's state dictionary
model_state_dict_path = IOUtils.sanitize_path(config["LPACModel"]["Model"])
state_dict = torch.load(model_state_dict_path, map_location=torch.device('cpu'))
model.load_state_dict(state_dict, strict=False)
model.eval()

# Continue with evaluation logic...

But it doesn't work

AgarwalSaurav commented 3 months ago

Check config["LPACModel"]["Model"] and make sure your config file has the path to the state dictionary. It seems you are loading a model and not a state dictionary.

wahaha-1 commented 3 months ago

I am using the default configuration and only changed the model files in the configuration file. The problem I found is: when loading the model, a model instance is passed instead of a state dictionary. torch.load should load a state dictionary containing model parameters, but the code is passed an instance of the LPAC class learning_params.toml：

[LPACModel]
Dir = "${CoverageControl_ws}/lpac/models/"
Model = "modelK1.pt"
Optimizer = "optimizerK1.pt"

[CNNModel]
Dir = "${CoverageControl_ws}/lpac/models/" # Absolute location
Model = "modelK1.pt"
Optimizer = "optimizerK1.pt"

[ModelConfig]
UseCommMaps = true

eval.toml：

[[Controllers]]
Name = "lpac"
Type = "Learning"
# ModelFile: "~/CoverageControl_ws/datsets/lpac/models/model_k3_1024.pt"
#ModelStateDict = "${CoverageControl_ws}/lpac/models/model_k3_1024_state_dict.pt"
ModelStateDict = "${CoverageControl_ws}/lpac/models/modelK1_epoch10.pt"
LearningParams = "${CoverageControl_ws}/lpac/params/learning_params.toml"
UseCommMap = true
UseCNN = true
CNNMapSize = 32

wahaha-1 commented 3 months ago

I just modified it and it worked successfully. But I don’t quite understand why your model suffix is _state_dict.pt eval.toml：

Name = "lpac"
Type = "Learning"
# ModelFile: "~/CoverageControl_ws/datsets/lpac/models/model_k3_1024.pt"
ModelFile ="~/CoverageControl_ws/lpac/models/modelK1_epoch10.pt"
#ModelStateDict = "${CoverageControl_ws}/lpac/models/model_k3_1024_state_dict.pt"
#ModelStateDict = "${CoverageControl_ws}/lpac/models/modelK1_epoch10.pt"
LearningParams = "${CoverageControl_ws}/lpac/params/learning_params.toml"
UseCommMap = true

KumarRobotics / CoverageControl

Issue with Generating and Using the Model After Training with LPAC Network #6