Only one triple per sentence in wikidata_tekgen dataset

Hello,

We're trying to re-use your dataset to further study fact extraction as part of a master's thesis at the University of Geneva.

This might be me not understanding something, but in the wikidata_tekgen train data split under data/wikidata_tekgen/train, there appears to only be a single triple per sentence, and the json objects format is not the same as the rest of the jsonl files, there is no triples key, instead a sub_label, rel_label and obj_label. There are certain sentences which repeat themselves though, for example,

Resident Evil: Damnation, known as Biohazard: Damnation ( , Baiohaz\u00c4\u0081do: Damun\u00c4\u0093shon) in Japan, is a 2012 Japanese adult animated biopunk horror action film by Capcom and Sony Pictures Entertainment Japan, directed by Makoto Kamiya and produced by Hiroyuki Kobayashi.

Appears in sentences ont_1_movie_train_27 and ont_1_movie_train_612 and each have a triple. Why are these two seperate json objects ? Wouldn't it make sense to fold them into one.

In the dpedia_webnlg on the other hand, train and test jsonl files are the same and in the training data there are multiple triples per sentence. Is this normal ? Or have perhaps the wrong files been uploaded to the repository ? What this implies is that the model might only extract a single triple every time on wikidata_tekgen since certain train sentences can have multiple triples following the ontology but which aren't in the train files.

Best regards,

A. Freeman

cenguix / Text2KGBench

Only one triple per sentence in wikidata_tekgen dataset #18