Related Issue(s)

242

Description of Changes

Created a separate API

It randomly shuffles indices of the X (if particular predicates, then the splits are done accordingly for further processing)
gets count of entities and relations
iterates through the shuffled indices and removes the triple and puts it either in train or test depending on whether the entity/relation count > 0
stops when the test set of reqd size is found
if not of required size - checks if allow duplicate is set of not.
if allow duplicate, it duplicates the test set triple. If not throws the exception
returns train and test splits if everything successful

Any other comments?

old api is also available for people who are using it in their ML pipelines - who depend on seeding for reproducibility of dataset splits. They can set the backward_compatible argument to true

Accenture / AmpliGraph

Feature/242 #243

Related Issue(s)

242

Description of Changes

Any other comments?