TimDettmers / ConvE

Convolutional 2D Knowledge Graph Embeddings resources
MIT License
675 stars 163 forks source link

Sth seems to be wrong when I change to use a new large dataset #34

Closed RichardHGL closed 5 years ago

RichardHGL commented 6 years ago

I try to train ConvE on a new data set, which has 128148 entities and 22 relations. My train set contains 684729 triples, and test 171171 triples. Is it too large? I guess the error comes at train_batcher = StreamBatcher(Config.dataset, 'train', Config.batch_size, randomize=True, keys=input_keys) The error information show as follow:

Exception in thread Thread-3: Traceback (most recent call last): File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0] File "mtrand.pyx", line 1142, in mtrand.RandomState.choice ValueError: a and p must have same size

Exception in thread Thread-1: Traceback (most recent call last): File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0] File "mtrand.pyx", line 1142, in mtrand.RandomState.choice ValueError: a and p must have same size

TimDettmers commented 5 years ago

Have you tried preprocessing the dataset anew? This means both (1) running python wrangle_KG.py DATASET_NAME and (2) running the preprocess method in the main.py script with delta_data=False (this can be invoked with the process argument or python main.py process True). Let me know if you are still running into issues.

RichardHGL commented 5 years ago

I set delta_data=False in main.py, image

and run with following command

python main.py model ConvE input_drop 0.2 hidden_drop 0.3\
                                      feat_drop 0.2 lr 0.003 lr_decay 0.995 \
                                      dataset Mydataset process True

But, it comes out as same error.

Traceback (most recent call last):
  File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run
    shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0]
  File "mtrand.pyx", line 1142, in mtrand.RandomState.choice
ValueError: a and p must have same size

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run
    shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0]
  File "mtrand.pyx", line 1142, in mtrand.RandomState.choice
ValueError: a and p must have same size

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run
    shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0]
  File "mtrand.pyx", line 1142, in mtrand.RandomState.choice
ValueError: a and p must have same size

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/home/anaconda2/envs/pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/tools/ConvE/src/spodernet/spodernet/preprocessing/batching.py", line 175, in run
    shard_idx = self.rdm.choice(len(list(self.shard2batchidx.keys())), 1, p=self.shard_fractions)[0]
  File "mtrand.pyx", line 1142, in mtrand.RandomState.choice
ValueError: a and p must have same size
RichardHGL commented 5 years ago

I have found the problem. I use wrong spliter, so the preprocessing can not work well. Thanks for your attention and consideration.