GeminiLight / hrl-acra

[TSC'23 - HRL-ACRA] Implementation of our paper "Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning", accepted by IEEE Transactions on Services Computing (TSC).
10 stars 0 forks source link

AttributeError: 'NoneType' object has no attribute 'weight' #2

Closed lpj12121 closed 4 months ago

lpj12121 commented 6 months ago

Hello, I am very interested in your paper and code, I want to try to run it, but I encountered some problems.My operating system is win11, and my hardware is i7-14700HX CPU and RTX 4060. I have installed it according to your environment requirements. When I want to train the model via command line, for example run python main.py --solver_name="hrl_ra" --eval_interval=10 --num_train_epochs=100 --summary_file_name="exp-wx_100-hrl_ra-training.csv" --seed =0, the following error will occur:

Traceback (most recent call last): File "D:\lunwenfuxian\hrl-acra-main\main.py", line 29, in run(config) File "D:\lunwenfuxian\hrl-acra-main\main.py", line 15, in run scenario = BasicScenario.from_config(Env, Solver, config) File "D:\lunwenfuxian\hrl-acra-main\base\scenario.py", line 50, in from_config solver = Solver(controller, recorder, counter, **vars(config)) File "D:\lunwenfuxian\hrl-acra-main\solver\learning\hrl_ra\hrl_ra_solver.py", line 20, in init self.policy = ActorCritic(p_net_num_nodes=num_p_net_nodes, p_net_feature_dim=4+4+3, p_net_edge_dim=1+1, v_net_feature_dim=3+3+POSITIONAL_EMBEDDING_DIM, v_net_edge_dim=1,
File "D:\lunwenfuxian\hrl-acra-main\solver\learning\hrl_ra\net.py", line 69, in init self.actor = Actor(p_net_num_nodes, p_net_feature_dim, p_net_edge_dim, v_net_feature_dim, v_net_edge_dim, embedding_dim, dropout_prob=dropout_prob, batch_norm=batch_norm)
File "D:\lunwenfuxian\hrl-acra-main\solver\learning\hrl_ra\net.py", line 83, in init self.v_net_encoder = VNetEncoder(p_net_num_nodes, p_net_feature_dim, p_net_edge_dim, v_net_feature_dim, v_net_edge_dim, embedding_dim=embedding_dim, dropout_prob=dropout_prob, batch_norm=batch_norm) File "D:\lunwenfuxian\hrl-acra-main\solver\learning\hrl_ra\net.py", line 16, in init self.v_net_gnn = self.GNNConvNet(v_net_feature_dim, embedding_dim, num_layers=3, embedding_dim=embedding_dim, edge_dim=v_net_edge_dim, dropout_prob=dropout_prob, batch_norm=batch_norm) File "D:\lunwenfuxian\hrl-acra-main\solver\learning\net.py", line 292, in init self._init_parameters() File "D:\lunwenfuxian\hrl-acra-main\solver\learning\net.py", line 296, in _initparameters nn.init.orthogonal(getattr(self, f'conv_{layer_id}').lin_src.weight) AttributeError: 'NoneType' object has no attribute 'weight'

The IDE finally located line 296 in the net.py file. I don't know how to solve this problem. Can you provide me with some suggestions?

Last, I also wish you good luck in your research.

lpj12121 commented 6 months ago

Supplement: There is a problem with this line of code nn.init.orthogonal(getattr(self, f'conv{layer_id}').lin_src.weight), which is under the _init_parameters(self): function in the net.py file

GeminiLight commented 6 months ago

This issue may be attributed to the version update of PyG or Pytorch. A crude solution is to skip (comment out) the function self.init_parameters(), which may slightly affect performance.

You can try this solution first and look forward to new feedback.

lpj12121 commented 6 months ago

This issue may be attributed to the version update of PyG or Pytorch. A crude solution is to skip (comment out) the function self.init_parameters(), which may slightly affect performance.

You can try this solution first and look forward to new feedback.

thanks for your correction. Yes, the version dependencies between various packages trouble me. I can only install them by executing the commands in install.sh line by line. After I commented out the calling part of the self._init_parameters() function on line 292 in the solver\learning\net.py file, it seems that normal training can start:

-------------------- Pretrain --------------------

Training Epoch: 0 temp save record in save\hrl_ra\LAPTOP-FD1TIOLL-20240402T162755\records\temp-0.csv

***Generate virtual networks with seed 0 Update time: 000010 | +0.0280 & -0.0001 & +0.1423 & +4.3284 & -4.3283 & -4.3285 & +0.8626 Update time: 000021 | +0.0502 & +0.0007 & +0.1784 & +4.0059 & -4.0061 & -4.0057 & +0.8482 Update time: 000021 | +0.0502 & +0.0007 & +0.1784 & +4.0059 & -4.0061 & -4.0057 & +0.8482......

.....

At the same time, I found new problems, line 383 in the solver\learning\rl_solver.py file and line 90 in solver\learning\searcher.py candicate_action_dist = Categorical(probs=candicate_action_dist) will report an error for candicate_action_dist. Will this affect the running of the program? Should it be changed to candicate_action_dist = Categorical(probs=candicate_action_probs)?

By the way, how long did it take you to train the model? My own computer is really slow to train.

GeminiLight commented 6 months ago

Training the low-level agent for RA task may require about six hours with 3090 GPU.

lpj12121 commented 6 months ago

Training the low-level agent for RA task may require about six hours with 3090 GPU.

ok,

candicate_action_dist = Categorical(probs=candicate_action_dist)

Thank you very much. Finally, I would like to ask how to deal with the error reported above. Line 383 of the solver\learning\rl_solver.py file and line 90 of the solver\learning\searcher.py file will be corrected in candicate_action_dist = Categorical(probs=candicate_action_dist). probs=candicate_action_dist reports an error. If this does not affect training, I will ignore this problem for now.