It randomly shuffles indices of the X (if particular predicates, then the splits are done accordingly for further processing)
gets count of entities and relations
iterates through the shuffled indices and removes the triple and puts it either in train or test depending on whether the entity/relation count > 0
stops when the test set of reqd size is found
if not of required size - checks if allow duplicate is set of not.
if allow duplicate, it duplicates the test set triple. If not throws the exception
returns train and test splits if everything successful
Any other comments?
old api is also available for people who are using it in their ML pipelines - who depend on seeding for reproducibility of dataset splits. They can set the backward_compatible argument to true
Related Issue(s)
242
Description of Changes
Created a separate API
Any other comments?
old api is also available for people who are using it in their ML pipelines - who depend on seeding for reproducibility of dataset splits. They can set the backward_compatible argument to true