Thanks for sharing the source code of JODIE ! It's really an impressive method which has great potential ability for recommendation !
While, for now I'm trying to apply this source code on some online shopping dataset, which contains 10k user and 20k items. I met two hard problem.
The source code crashed when the first epoch was finished when calling save_model function in file jodie.py. With a error msg:
Traceback (most recent call last):
File "jodie.py", line 219, in
save_model(model, optimizer, args, ep, user_embeddings_dystat, item_embeddings_dystat, train_end_idx, user_embeddings_timeseries, item_embeddings_timeseries)
File "/home/jianfeng/dl/jodie_dip/library_models.py", line 163, in save_model
torch.save(state, filename)
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/serialization.py", line 260, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/serialization.py", line 185, in _with_file_like
return body(f)
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/serialization.py", line 260, in
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/serialization.py", line 332, in _save
pickler.dump(obj)
OverflowError: cannot serialize a string larger than 2 GiB
For another similar size dataset, the code crashed with the following error msg:
Initializing the JODIE model
Initializing user and item embeddings
Initializing user and item RNNs
Initializing linear layers
JODIE initialization complete
Training the JODIE model for 1 epochs
Epoch 0 of 1: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):%|█▊ | 9382/50209 [00:49<01:25, 478.87it/s]
File "jodie.py", line 193, in ████| 4/4 [00:00<00:00, 18.88it/s]
loss.backward()
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/jianfeng/.conda/envs/jodie/lib/python2.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.81 GiB (GPU 0; 15.90 GiB total capacity; 12.60 GiB already allocated; 819.88 MiB free; 1.80 GiB cached)
It seems that the CUDA was OOM, I think this might caused by a too big size of the t_batch, but in the source code, we can only set the _tbatchtimespan variable. How can I fix this for apply this model on this dataset ?
Thanks for sharing the source code of JODIE ! It's really an impressive method which has great potential ability for recommendation !
While, for now I'm trying to apply this source code on some online shopping dataset, which contains 10k user and 20k items. I met two hard problem.
It seems that the CUDA was OOM, I think this might caused by a too big size of the t_batch, but in the source code, we can only set the _tbatchtimespan variable. How can I fix this for apply this model on this dataset ?
Thanks again for your attention about this issue.