Closed iceshzc closed 5 years ago
Hi, thank you for your concern. I will try to clarify my idea, but if you still have questions, feel free to comment.
The idea of func and ifunc is to classify different relations by their statistical properties. For example, one-to-one relation has the func equalling 1.0 and one-to-two relation has the func equalling 0.5. To avoid the func being too small, we add a max function to it, which was the original operation of the paper.
As for the sqrare root operation, I just think it has the same effect of max... And now I updated the code since many researchers are puzzled by this operation.
Yeah, it is not necessary to be symmetric.
It is (with high probability) not same with JAPE. But I think if the work in ICWS17 randomly chose the seeds, the results will be statistically same.
OK, Thank you for your response. For Q.1 and Q.2,I think it's not a big matter. For Q.3, in fact, I also review the source code of JAPE. The concrete datasets for different ratio of training size are given in each dataset, eg., ../dbp15k/zh-en/0_3 means 30% portion of training data in zh-en. As a result, I move the 'sup_ent_ids' and 'ref_ent_ids' in 0_3 to replace the train_data and test_data in your code.
Unfortunately, when I set dim_se = 1000, dim_ae = 100 for zh-en/0_3, your method cannot reproduce the promising results, where JAPE reproduce the similar results which said in their paper. So I want to know how to tune the parameter for GCN-Align.
Thank you and wait for your response.
Yes, that's a good question. We split the training set into two parts: training and evaluation. After parameters tuned, we use those parameters to retrain with the full training set (since training seeds have great influence in this scenario).
More recently, we also found that the loss value is a good criterion for alignment, which does not need evaluation seeds. This idea is similar to MUSE. I am doing more research on it, which may be demonstrated in my bachelor degree paper later.
Hi, I try my best to reproduce your results. Unfortunately, I cannot reproduce the promising results. In fact, I just find that the "zh_en/ref_ent_ids" in your paper is not equal "ref_ent_ids" + "sup_ent_ids" in "zh_en/0_3" in JAPE. I wonder how you construct the whole labeled data where the first 10500 lines are the same with "zh_en/0_3/ref_ent_ids" and the last 4500 lines are not the same with "zh_en/0_3/sup_ent_ids". In the other hand, I also use your dataset and change the random_seed, it seems to reproduce the approximate results. One thing more, I don't know why you set the learning rate larger than 1 (eg. lr=20 in your code). Thank you for your notice and waiting for your response.
Actually, you just need to run python train.py
to reproduce the results. With time going by, we find some tricks to get similar results with fewer parameters (e.g. lower dimension of SE), and the code has been modified a little bit. If you insist getting promising results with the parameters in the paper, the structure of the model should be modified a little bit as well. For example, the weight matrices were initialized by normal distribution in each layer, which caused some dead neurals which makes SE being 1000 dimension to get promising results.
For the sup_ent_ids, the reason we reorder the ids is that the entity pairs are merged in JAPE's code. But in our code, we don't need to merge them in data files. Therefore, we just split the merged entity pairs and give a new id for the extra entity. If you are doubt about the dataset, you can construct one by the initial dbp15k, or wait for some days if you don't mind, because when my bachelor degree paper finished, I will make a similar repo public with the dbp15k totally same as JAPE.
For the learning rate, I think it is ok to be any value, which is just determined by tuning. For example, if the best lr for loss = a+b
is 1, then the best lr for loss = (a+b)/2
will be 2.