apoorvumang / kgt5

ACL 2022: Sequence-to-Sequence Knowledge Graph Completion and Question Answering (KGT5)
Apache License 2.0
98 stars 18 forks source link

A bug when creating entity_strings.txt for wikidata5m #2

Closed Neo-Zhangjiajie closed 2 years ago

Neo-Zhangjiajie commented 2 years ago

when I creat entity_strings.txt for wikidata5m, it reports a bug.

python /home/zjj/kgt5/data/get_unique_entities.py --dataset wikidata5m 285780it [00:00, 596976.79it/s] Traceback (most recent call last): File "/home/zjj/kgt5/data/get_unique_entities.py", line 26, in <module> unique_entities.add(split_sentence[1].strip()) IndexError: list index out of range

And I output this line in train.txt, it is predict tail: creation | destruction | instance of | bonus tracks

So how do I solve this? Just skip this line?

apoorvumang commented 2 years ago

Yeah that's a bug since the entity representation here also contains the symbol '|'. This was fixed in a subsequent dataset revision (we will upload that soon)

For now, you can skip this line I think (other option would be to look for entity 'creation | destruction' and remove the symbol '|' from it, but this is not ideal).