Hi, I really liked the paper and the approaches outlined. I had two questions:
I know the paper is targeted towards commonsense KGs, but are there results for this technique on standard KG benchmark datasets like WN18/FB15k-237/the RR versions?
Are there any ablations to show the effect of BERT pretraining? (fine tuning with MLM on the corpus used in the nodes vs using a standard off the shelf pretrained BERT)
Hi, I really liked the paper and the approaches outlined. I had two questions: