Closed TedSIWEILIU closed 3 years ago
2020-12-27 15:41:52 [ERROR]-Traceback (most recent call last):
2020-12-27 15:41:52 [ERROR]- File "train_lcfn.py", line 128, in
Cannot be executed successfully with a bug when running within GPU machines. Please fix it @TedSIWEILIU
2020-12-27 15:41:52 [ERROR]-Traceback (most recent call last): 2020-12-27 15:41:52 [ERROR]- File "train_lcfn.py", line 128, in 2020-12-27 15:41:52 [ERROR]- train_engine.train() 2020-12-27 15:41:52 [ERROR]- File "train_lcfn.py", line 102, in train 2020-12-27 15:41:52 [ERROR]- self._train(self.engine, train_loader, self.model_save_dir) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/core/train_engine.py", line 231, in _train 2020-12-27 15:41:52 [ERROR]- engine.train_an_epoch(train_loader, epoch_id=epoch) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 160, in train_an_epoch 2020-12-27 15:41:52 [ERROR]- loss = self.train_single_batch(batch_data) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 119, in train_single_batch 2020-12-27 15:41:52 [ERROR]- self.model.forward() 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 68, in forward 2020-12-27 15:41:52 [ERROR]- self.user_all_embeddings = torch.cat(self.user_all_embeddings, 1) 2020-12-27 15:41:52 [ERROR]-RuntimeError: All input tensors must be on the same device. Received cuda:0 and cpu
The CUDA issue is fixed now.
Cannot be executed successfully with a bug when running within GPU machines. Please fix it @TedSIWEILIU
2020-12-27 15:41:52 [ERROR]-Traceback (most recent call last): 2020-12-27 15:41:52 [ERROR]- File "train_lcfn.py", line 128, in 2020-12-27 15:41:52 [ERROR]- train_engine.train() 2020-12-27 15:41:52 [ERROR]- File "train_lcfn.py", line 102, in train 2020-12-27 15:41:52 [ERROR]- self._train(self.engine, train_loader, self.model_save_dir) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/core/train_engine.py", line 231, in _train 2020-12-27 15:41:52 [ERROR]- engine.train_an_epoch(train_loader, epoch_id=epoch) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 160, in train_an_epoch 2020-12-27 15:41:52 [ERROR]- loss = self.train_single_batch(batch_data) 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 119, in train_single_batch 2020-12-27 15:41:52 [ERROR]- self.model.forward() 2020-12-27 15:41:52 [ERROR]- File "../beta_rec/models/lcfn.py", line 68, in forward 2020-12-27 15:41:52 [ERROR]- self.user_all_embeddings = torch.cat(self.user_all_embeddings, 1) 2020-12-27 15:41:52 [ERROR]-RuntimeError: All input tensors must be on the same device. Received cuda:0 and cpu
The CUDA issue is fixed.
Add lcfn_default.json, lcfn.py and train_lcfn.py files. Update the deprecated_data_base.py
lcfn_default.json includes all parameters for the LCFN model. lcfn.py is the implementation of the model. train_lcfn.py is an example of how to train the LCFN model. I update the deprecated_data_base.py file to include the function to generate the graph_embeddings. #367