JD-AI-Research-Silicon-Valley / SACN

End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion
MIT License
113 stars 30 forks source link

why the adjacency matrix constructed from the training set different for each run? #15

Closed maqy1995 closed 4 years ago

maqy1995 commented 4 years ago

In main.py line 134-148,

    for i, str2var in enumerate(train_batcher):
        print("batch number:", i)
        for j in range(str2var['e1'].shape[0]):
            for k in range(str2var['e2_multi1'][j].shape[0]):
                if str2var['e2_multi1'][j][k] != 0:
                    a = str2var['rel'][j].cpu()
                    data.append(str2var['rel'][j].cpu())
                    rows.append(str2var['e1'][j].cpu().tolist()[0])
                    columns.append(str2var['e2_multi1'][j][k].cpu())
                else:
                    break

    rows = rows  + [i for i in range(num_entities)]
    columns = columns + [i for i in range(num_entities)]
    data = data + [num_relations for i in range(num_entities)]

the rows or columns shape is changed when we run the process twice, which means that train set is mutable?

maqy1995 commented 4 years ago

when I set the randomize to False, the shape of rows will not change in the same batch_size. But the shape is changed again when I set Config.batch_size to another value. The code is below(main.py line 114)

train_batcher = StreamBatcher(Config.dataset, 'train', Config.batch_size, randomize=True, keys=input_keys)

I guess the reason may be caused by StreamBatcher dropped the last batch.

chaoshangcs commented 4 years ago

Thanks for your question. Yes, I agree with you. The streamBatcher might drop the last batch. If that is the reason, we can manually add the lost rows into the sparse graph.

maqy1995 commented 4 years ago

Thanks for your question. Yes, I agree with you. The streamBatcher might drop the last batch. If that is the reason, we can manually add the lost rows into the sparse graph.

any idea to manually add the lost rows into the sparse graph? or how to fix the StreamBatcher to get fully train set? The code in StreamBatcher is a bit hard to read for me...T T I try to modify the code below: https://github.com/JD-AI-Research-Silicon-Valley/SACN/blob/6f9831fdd02dec6b116e27a661aee483255c5f59/src/spodernet/spodernet/preprocessing/batching.py#L221 to:

self.num_batches = int(math.ceil(np.sum(config['counts']) / batch_size))

it works when set randomize=False in different batch size, but if randomize=True, this code fails again.

chaoshangcs commented 4 years ago

Thanks for your reply. You may consider the reminder of "np.sum(config['counts']) / batch_size". If the remainder is 0, it is fine. If not, you can add 1 to the num_batches. This package is coming from "https://github.com/TimDettmers/spodernet" for your reference.

maqy1995 commented 4 years ago

It seems ConvE also has the same problem, see: https://github.com/TimDettmers/ConvE/issues/2 https://github.com/TimDettmers/ConvE/issues/25 and, Quirks in README with convE: https://github.com/TimDettmers/ConvE looks like it's not easy to fix it...

maqy1995 commented 4 years ago

another question is, why we need to do this: https://github.com/JD-AI-Research-Silicon-Valley/SACN/blob/6f9831fdd02dec6b116e27a661aee483255c5f59/models.py#L167 It seems to convert adj matrix A to symmetric matrix, but in my understanding, the train set seems use the reverse relation, so the adj matrix is symmetric when we first get it in main: https://github.com/JD-AI-Research-Silicon-Valley/SACN/blob/6f9831fdd02dec6b116e27a661aee483255c5f59/main.py#L152 Please correct me if I understand it incorrectly.

chaoshangcs commented 4 years ago

Hi. Thanks to find the links which I haven't noticed before. It is helpful to answer this question. For the new question, you can check the lines from 134 to 144 in main.py. If I remember correctly, the graph created there is a nonsymmetric matrix. So we need to do this step.

maqy1995 commented 4 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper.
I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

chaoshangcs commented 4 years ago

Hi. Thanks for your question. You should be able to reproduce similar results. You can tune the hyperparameters using the recommended values in https://github.com/JD-AI-Research-Silicon-Valley/SACN/blob/master/src/spodernet/spodernet/utils/global_config.py. If you can get better results, look forward to your sharing of the hyperparameters.

chenmeiqi777 commented 3 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper. I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

Excuse me, could you please tell me the hyperparameters you use for FB15k-237? I just run the code the author releases, but get Hits@10 0.518, MRR 0.340 at most during 1000 epochs. In other words, I can't reproduce the result reported. (0.54 and 0.35)

maqy1995 commented 3 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper. I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

Excuse me, could you please tell me the hyperparameters you use for FB15k-237? I just run the code the author releases, but get Hits@10 0.518, MRR 0.340 at most during 1000 epochs. In other words, I can't reproduce the result reported. (0.54 and 0.35)

sorry, I did not save the hyperparameters I used. In my memory, I use the default hyperparameters(I'm not sure...), but I change the batch size to 4096. I get the best MRR int test set around epoch 2296. In this epoch, the total result is : TOP1: 0.2598 TOP3:0.383964 TOP10:0.5357 MRR:0.35063.

chaoshangcs commented 3 years ago

Hi. Thanks for your sharing. It is so great to get your feedback. : )

chenmeiqi777 commented 3 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper. I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

Excuse me, could you please tell me the hyperparameters you use for FB15k-237? I just run the code the author releases, but get Hits@10 0.518, MRR 0.340 at most during 1000 epochs. In other words, I can't reproduce the result reported. (0.54 and 0.35)

sorry, I did not save the hyperparameters I used. In my memory, I use the default hyperparameters(I'm not sure...), but I change the batch size to 4096. I get the best MRR int test set around epoch 2296. In this epoch, the total result is : TOP1: 0.2598 TOP3:0.383964 TOP10:0.5357 MRR:0.35063.

Thanks for your reply! I will have a try.

maqy1995 commented 3 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper. I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

Excuse me, could you please tell me the hyperparameters you use for FB15k-237? I just run the code the author releases, but get Hits@10 0.518, MRR 0.340 at most during 1000 epochs. In other words, I can't reproduce the result reported. (0.54 and 0.35)

sorry, I did not save the hyperparameters I used. In my memory, I use the default hyperparameters(I'm not sure...), but I change the batch size to 4096. I get the best MRR int test set around epoch 2296. In this epoch, the total result is : TOP1: 0.2598 TOP3:0.383964 TOP10:0.5357 MRR:0.35063.

Thanks for your reply! I will have a try.

Recently, I try to remove spodernet and implement WGCN by DGL, but I can't produce the paper result either(T T). This is my implementation and result: https://github.com/maqy1995/sacn_dgl

chenmeiqi777 commented 3 years ago

Excuse me again... When I use FB15k-237, I get a similar result with the paper, but when I use WN18RR, the Hits@1 is much less than the value in the paper. I use the hyper-parameters followed by:

for WN18RR dataset, set dropout to 0.2, number of kernels to 300, learning rate to 0.003, and embedding size to 200 for SACN.

I run 1000 epochs, the Hits @1 = 0.34(the value in paper is 0.43), Hits @3 = 0.46 , Hits @10 = 0.52, and MRR=0.41. What is the possible reason?

Excuse me, could you please tell me the hyperparameters you use for FB15k-237? I just run the code the author releases, but get Hits@10 0.518, MRR 0.340 at most during 1000 epochs. In other words, I can't reproduce the result reported. (0.54 and 0.35)

sorry, I did not save the hyperparameters I used. In my memory, I use the default hyperparameters(I'm not sure...), but I change the batch size to 4096. I get the best MRR int test set around epoch 2296. In this epoch, the total result is : TOP1: 0.2598 TOP3:0.383964 TOP10:0.5357 MRR:0.35063.

Thanks for your reply! I will have a try.

Recently, I try to remove spodernet and implement WGCN by DGL, but I can't produce the paper result either(T T). This is my implementation and result: https://github.com/maqy1995/sacn_dgl

I just got similar results as yours... At most 0.526, no higher than 0.53.

chaoshangcs commented 3 years ago

Hi, thanks for the sharing. DGL is a great and convenient tool. I'd like to recommend your implementation to other researchers.