ekzhu / SetSimilaritySearch

All-pair set similarity search on millions of sets in Python and on a laptop
Apache License 2.0
589 stars 40 forks source link

Allow 'sets' argument to be any iterable #7

Open jaklinger opened 4 years ago

jaklinger commented 4 years ago

Currently

https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/search.py#L27 https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/all_pairs.py#L28

state:

if not isinstance(sets, list) or len(sets) == 0:
        raise ValueError("Input parameter sets must be a non-empty list.")

I propose to change this to:

if not isinstance(sets, Iterable) or len(sets) == 0:
        raise ValueError("Input parameter sets must be a non-empty iterable.")

Which then allows inputs as tuple as well, as well as ordered key-sets. Was helpful in my use case, rather than having to create a copy of the data in list form.

Setting up a PR from my fork, let me know what you think.