ekzhu / SetSimilaritySearch

All-pair set similarity search on millions of sets in Python and on a laptop
Apache License 2.0
588 stars 40 forks source link

Allow multiple tokens per line #14

Closed innovate-invent closed 2 years ago

innovate-invent commented 2 years ago

I have to generate data in the form SetID Token Token Token, this PR saves from having to do any preprocessing of the data.