Closed limhao closed 1 year ago
@wolaiye1010 麻烦看一下你上传的这份PLE这个问题
@easezyc , 已解决,发了新的pr
@limhao ,已解决
问题原因是,python的list需要包装进 torch.nn.ModuleList 才能使模型调用to_device 的时候模型内部的子模块也能自动to_device
@wolaiye1010 感谢,已合并
Traceback (most recent call last):
File "D:\my3090\Multitask-Recommendation-Library\main.py", line 220, in
Process finished with exit code 1
self.task_experts = torch.nn.ModuleList(self.task_experts) self.task_gates = torch.nn.ModuleList(self.task_gates) self.share_experts = torch.nn.ModuleList(self.share_experts) self.share_gates = torch.nn.ModuleList(self.share_gates) 问题 应该出在这一块
@limhao @easezyc , 不好意思,昨天有一行代码漏改了,已修改,我自测没有问题了,pr已发再看下吧,
合并了
Model: PLE
0%| | 0/102 [00:09<?, ?it/s]
Traceback (most recent call last):
File "D:\my3090\Multitask-Recommendation-Library\main.py", line 220, in
Process finished with exit code 1
ple line50 显示不在一个设备
文件ple的import layer一直存在一个问题 虽然很简单 只需要加个·就好 希望作者注意
Model: PLE 0%| | 0/102 [00:09<?, ?it/s] Traceback (most recent call last): File "D:\my3090\Multitask-Recommendation-Library\main.py", line 220, in main(args.dataset_name, File "D:\my3090\Multitask-Recommendation-Library\main.py", line 178, in main train(model, optimizer, train_data_loader, criterion, device) File "D:\my3090\Multitask-Recommendation-Library\main.py", line 80, in train y = model(categorical_fields, numerical_fields).to(device) File "D:\ana\envs\rs\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "D:\my3090\Multitask-Recommendation-Library\models\ple.py", line 50, in forward categorical_emb = self.embedding(categorical_x) File "D:\ana\envs\rs\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "D:\my3090\Multitask-Recommendation-Library\models\layers.py", line 17, in forward return self.embedding(x) File "D:\ana\envs\rs\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "D:\ana\envs\rs\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward return F.embedding( File "D:\ana\envs\rs\lib\site-packages\torch\nn\functional.py", line 2044, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)
Process finished with exit code 1
ple line50 显示不在一个设备
这个应该是你的输入没有to device 到cuda,我这个代码现在应该没有问题的,因为我们线上环境都在跑呢
@wolaiye1010 刚刚重新看了看 是我main改的混乱了 替换了一下 能用了
我尝试过修改 但是改不对