009-Personal-Alexa-like-Speech-Service / 009---Personal-Alexa-like-Speech-Service

BAA Projekt
0 stars 1 forks source link

Implement Lemmatizer #21

Closed kimmrz closed 3 years ago

kimmrz commented 3 years ago

in Spacy

kimmrz commented 3 years ago

maybe we can start with this one

lemma = []

for doc in nlp.pipe(df['col'].astype('unicode').values, batch_size=9844, n_threads=3): if doc.isparsed: lemma.append([n.lemma for n in doc if not n.lemma_.ispunct | n.lemma != "-PRON-"]) else: lemma.append(None)

df['lemma_col'] = lemma

vect = sklearn.feature_extraction.text.TfidfVectorizer() lemmas = df['lemma_col'].apply(lambda x: ' '.join(x)) vect = sklearn.feature_extraction.text.TfidfVectorizer() features = vect.fit_transform(lemmas)

feature_names = vect.get_feature_names() dense = features.todense() denselist = dense.tolist()

df = pd.DataFrame(denselist, columns=feature_names) df = pd.DataFrame(denselist, columns=feature_names) lemmas = pd.concat([lemmas, df]) df= pd.concat([df, lemmas])

kimmrz commented 3 years ago

Since we solved the issue with another code in our session today, I'll close this issue? Do you agree @Melina-MLEAN @Max123S ?