Closed JasonCen-sweetdreams closed 2 years ago
You can first check the vocabulary (entity2id dict), and check what is missing. I think the code can work if the vocabulary is correct. Meanwhile, there is some difference between Meta preprocessing and Freebase preprocessing. I think for CWQ (Freebase), it's step 7. Did you miss step 6?
In python file simplify_dataset.py, you use strip()
in line 49, while in the first method simplify_entities(entity_list, entity2id)
you didn't use strip()
, that cause the error.
I think the correction is
entity_text = entity['text'].strip()
Yeah, it may cause this error, thanks for your feedback, I fixed it.
When I run
simplify_dataset.py
, it turns out that there are some irregular keys in json filetrain.json
. Data in this file is splited automatically fromCWQ_step1_01.json
usingtrain_test_split
fromsklearn
.The strange keys include: ' royalty' which is supposed to be 'royalty' , ' Using various tricks of light, perspective and erasure...' (both are in entity2id[obj['text']])
I guess there might be some errors in
CWQ/subgraph/subgraph_hop2.txt
orpreprocess_step1.py
. And if needed I can provide you with the code I separate the data fromCWQ_step1_01.json
.The output is here: