Closed JDRanpariya closed 1 year ago
Depends on what you want to do. Here is some code example:
from polyleven import levenshtein
import random
import pandas as pd
def getdata():
return ["".join(random.choices("AGCT", k=10)) for x in range(100000)]
# Python List
dataset = getdata()
[(levenshtein(item, "ATACAAACTC")) for item in dataset]
# Pandas DataFrame
df = pd.DataFrame({"a": getdata(), "b": getdata()})
df["distance"] = df.apply(lambda x: levenshtein(x.a, x.b), axis=1)
100k entries are really not much. It will take <1s to process on a consumer CPU.
Is there a way to pass pandas data frame or python list?