Open oplatek opened 1 year ago
At the moment, we use the number of "sortedtripleset" sentences i.e. the highest sentence ID
as checksum.
We were able to segment ~98% of original texts
into sentences so number of sentences matches number of sentences referenced in "sortedtripleset".
However, it is only a heuristic.
Thank you for making the WebNLG dataset with the alignment available!
We would like to align sentences in the
original text
and the triples insortedtripleset
.Is there a function/procedure which replicates the segmentation perfectly?
Here is the example from the README to ground what I mean by the
original text
andsortedtripleset
.