Closed zhangyxrepo closed 3 weeks ago
Hi, which version of PyG are you using?
Hi, which version of PyG are you using?
hi @migalkin , my pyg version is 2.5.0 and torch version is 2.1.0
Yeah, that's a known bug with pyg 2.5.0 - I'd recommend either downgrading pyg to 2.4.0 or upgrading to anything >= 2.5.2
Yeah, that's a known bug with pyg 2.5.0 - I'd recommend either downgrading pyg to 2.4.0 or upgrading to anything >= 2.5.2
Hi @migalkin, thanks for your reply in time, but this error still exists after I upgraded Pig to 2.5.2. In fact, the only difference between our environments lies in the CUDA Driver version. Mine is 11.7, but it seems that you recommend running under 11.8. Could it be a possible reason?
No, this is a pyg version issue, try 2.4.0 as in requirements.txt
or the latest 2.6.1?
No, this is a pyg version issue, try 2.4.0 as in
requirements.txt
or the latest 2.6.1?
@migalkin After I downgraded pyg to 2.4.0, this problem no longer seems to occur, but new problems have appeared. Specifically, when I run:
python script/run.py -c /home/ULTRA/config/transductive/inference.yaml --dataset CoDExSmall --epochs 0 --bpe null --gpus [0] --ckpt /home/ULTRA/ckpts/ultra_4g.pth
it will counter a compile error:
...
16:25:30 CoDExSmall dataset
16:25:30 #train: 32888, #valid: 1827, #test: 1828
16:25:30 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
16:25:30 Evaluate on valid
Load rspmm extension. This may take a while...
Traceback (most recent call last):
File "/home/ULTRA/script/run.py", line 297, in <module>
test(cfg, model, valid_data, filtered_data=val_filtered_data, device=device, logger=logger)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ULTRA/script/run.py", line 136, in test
t_pred = model(test_data, t_batch)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ULTRA/ultra/models.py", line 23, in forward
relation_representations = self.relation_model(data.relation_graph, query=query_rels)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ULTRA/ultra/models.py", line 100, in forward
output = self.bellmanford(rel_graph, h_index=query)["node_feature"] # (batch_size, num_nodes, hidden_dim)
File "/home/ULTRA/ultra/models.py", line 76, in bellmanford
hidden = layer(layer_input, query, boundary, data.edge_index, data.edge_type, size, edge_weight)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ULTRA/ultra/layers.py", line 86, in forward
output = self.propagate(input=input, relation=relation, boundary=boundary, edge_index=edge_index,
File "/hom/ULTRA/ultra/layers.py", line 118, in propagate
out = self.message_and_aggregate(edge_index, **msg_aggr_kwargs)
File "/home/ULTRA/ultra/layers.py", line 187, in message_and_aggregate
from .rspmm import generalized_rspmm
File "/home/ULTRA/ultra/rspmm/__init__.py", line 1, in <module>
from .rspmm import generalized_rspmm
File "/home/ULTRA/ultra/rspmm/rspmm.py", line 207, in <module>
rspmm = load_extension("rspmm", [os.path.join(path, "rspmm.cpp"), os.path.join(path, "rspmm.cu")])
File "/home/ULTRA/ultra/rspmm/rspmm.py", line 202, in load_extension
return cpp_extension.load(name, sources, extra_cflags, extra_cuda_cflags, **kwargs)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/miniconda3/envs/pt201pg240/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /home/.cache/torch_extensions/py39_cu117/rspmm/rspmm.so: cannot open shared object file: No such file or directory
I looked through the README of this repo but I still don't understand why, could you help me on this?
You need to clean the cache of previously compiled kernels (and other jit-compiled code from previous pyg versions), delete the /home/.cache/torch_extensions/
folder, and kernels will be re-built upon the next launch
You need to clean the cache of previously compiled kernels (and other jit-compiled code from previous pyg versions), delete the
/home/.cache/torch_extensions/
folder, and kernels will be re-built upon the next launch
I did clean the cache by
rm -rf /home/.cache/*
but what is weird is that it didn't work as expected and the error still exists.
Looking at the error trace, it appears that the previously compiled python code (.pyc files) asks for older kernels, the best approach would be to remove the repo entirely and just make a clean clone and start again (a more tedious approach is to remove all *.pyc
files and __pycache__
folders everywhere in the repo).
Besides, I see your env name is pt201pg240
- does it mean you are on PyTorch 2.0.1? The minimal required torch version for Ultra is 2.1.0, please make sure you install correct package versions in your env according to requirements.txt
Looking at the error trace, it appears that the previously compiled python code (.pyc files) asks for older kernels, the best approach would be to remove the repo entirely and just make a clean clone and start again (a more tedious approach is to remove all
*.pyc
files and__pycache__
folders everywhere in the repo).Besides, I see your env name is
pt201pg240
- does it mean you are on PyTorch 2.0.1? The minimal required torch version for Ultra is 2.1.0, please make sure you install correct package versions in your env according torequirements.txt
Hi @migalkin,thank you for noticing. I double-checked the environment and found the torch version is 2.0.1. It is my fault to waste much of your time, and I want to express my apologies. However, I found another environment with CUDA 12.1 to evaluate, and now a new issue has arisen. Like:
python script/run.py -c config/transductive/inference.yaml --dataset CoDExSmall --epochs 0 --bpe null --gpus [0] --ckpt ckpts/ultra_4g.pth
and the error:
10:00:05 Random seed: 1024
10:00:05 Config file: config/transductive/inference.yaml
10:00:05 {'checkpoint': 'ckpts/ultra_4g.pth',
'dataset': {'class': 'CoDExSmall', 'root': '~/git/ULTRA/kg-datasets/'},
'model': {'class': 'Ultra',
'entity_model': {'aggregate_func': 'sum',
'class': 'EntityNBFNet',
'hidden_dims': [64, 64, 64, 64, 64, 64],
'input_dim': 64,
'layer_norm': True,
'message_func': 'distmult',
'short_cut': True},
'relation_model': {'aggregate_func': 'sum',
'class': 'RelNBFNet',
'hidden_dims': [64, 64, 64, 64, 64, 64],
'input_dim': 64,
'layer_norm': True,
'message_func': 'distmult',
'short_cut': True}},
'optimizer': {'class': 'AdamW', 'lr': 0.0005},
'output_dir': '~/git/ULTRA/output',
'task': {'adversarial_temperature': 1,
'metric': ['mr', 'mrr', 'hits@1', 'hits@3', 'hits@10'],
'name': 'TransductiveInference',
'num_negative': 256,
'strict_negative': True},
'train': {'batch_per_epoch': None,
'batch_size': 8,
'gpus': [0],
'log_interval': 100,
'num_epoch': 0}}
Processing...
Traceback (most recent call last):
File "/home/ULTRA/script/run.py", line 243, in <module>
dataset = util.build_dataset(cfg)
File "/home/ULTRA/ultra/util.py", line 149, in build_dataset
dataset = ds_cls(**data_config)
File "/home/ULTRA/ultra/datasets.py", line 383, in __init__
super(CoDExSmall, self).__init__(root=root, size='s')
File "/home/ULTRA/ultra/datasets.py", line 246, in __init__
super().__init__(root, transform, pre_transform)
File "/home/miniconda3/envs/pt210py39/lib/python3.9/site-packages/torch_geometric/data/in_memory_dataset.py", line 76, in __init__
super().__init__(root, transform, pre_transform, pre_filter, log)
File "/home/miniconda3/envs/pt210py39/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 102, in __init__
self._process()
File "/home/miniconda3/envs/pt210py39/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 235, in _process
self.process()
File "/home/ULTRA/ultra/datasets.py", line 292, in process
train_results = self.load_file(train_files[0], inv_entity_vocab={}, inv_rel_vocab={})
File "/home/ULTRA/ultra/datasets.py", line 265, in load_file
u, r, v = l.split() if self.delimiter is None else l.strip().split(self.delimiter)
could you please help me on this one? Thank you again.
could you please help me on this one?
Sure, but what is the error? Your trace ends on
File "/home/ULTRA/ultra/datasets.py", line 265, in load_file
u, r, v = l.split() if self.delimiter is None else l.strip().split(self.delimiter)
and I don't see the exact error. Which Python version is in your new env?
Sure, but what is the error? Your trace ends on
It is a value error:
File "/home/ULTRA/ultra/datasets.py", line 265, in load_file
u, r, v = l.split() if self.delimiter is None else l.strip().split(self.delimiter)
and the python version is 3.9.17
I still don't see what the exact error name is and where it throws an error - could you please copy the entire error trace including the line with ValueError
?
I still don't see what the exact error name is and where it throws an error - could you please copy the entire error trace including the line with
ValueError
?
I am sorry there must be something wrong with the copy board and preview:
File "/home/ULTRA/ultra/datasets.py", line 265, in load_file
u, r, v = l.split() if self.delimiter is None else l.strip().split(self.delimiter)
ValueError: not enough values to unpack (expected 3, got 1)
Specifically it is said: ValueError: not enough values to unpack (expected 3, got 1)
Are you sure you downloaded the datasets in the correct folder that is resolvable in the code?
The default path in the config files is ~/git/ULTRA/kg-datasets/
but looking at your previous messages your path might look like /home/ULTRA/
- I would suggest to hardcode your exact path in the config yaml files to reflect where your installation is, for example,
output_dir: /home/ULTRA/output
dataset:
class: {{ dataset }}
root: /home/ULTRA/kg-datasets/
Are you sure you downloaded the datasets in the correct folder that is resolvable in the code?
The default path in the config files is
~/git/ULTRA/kg-datasets/
but looking at your previous messages your path might look like/home/ULTRA/
- I would suggest to hardcode your exact path in the config yaml files to reflect where your installation is, for example,output_dir: /home/ULTRA/output dataset: class: {{ dataset }} root: /home/ULTRA/kg-datasets/
yes, I checked the .yaml file and found
output_dir: ~/git/ULTRA/output
dataset:
class: {{ dataset }}
root: ~/git/ULTRA/kg-datasets/
version: {{ version }}
there exists a folder named git
and the folder is like this:
It seems that the dataset is correctly downloaded and stored there.
Are you sure you downloaded the datasets in the correct folder that is resolvable in the code?
The default path in the config files is
~/git/ULTRA/kg-datasets/
but looking at your previous messages your path might look like/home/ULTRA/
- I would suggest to hardcode your exact path in the config yaml files to reflect where your installation is, for example,output_dir: /home/ULTRA/output dataset: class: {{ dataset }} root: /home/ULTRA/kg-datasets/
It's weird but it works after hardcore the exact path in the .yaml file. Thank you again for your patience all along!
Hi @migalkin , thanks for open-sourcing this great work, but I countered an issue when I was trying to run this demo on GPU by:
Specifically, the output and error message is:
This should be an obvious problem, but no one seems to have encountered it before according to issues of this repo, which puzzles me. Thanks for your reply in advance!