Closed dafajon closed 3 years ago
I see two different questions in here please split into two separate issues for us to proceed.
Check develop branch for drop-in replacement
from sadedegel.dataset import load_raw_corpus
from sadedegel.extension.sklearn import TfidfVectorizer
tra = TfidfVectorizer()
X = tra.transform(load_raw_corpus())
I checked on List
and pd.DataFrame
. Works fine. I will also use it with a pipeline. Will provide feedback. I had a FunctionTransformer
implementation in sentiment work. Will update that accordingly when this is in the new release.
Can you create a seperate issue for parallel processing. This issue is already closed with a commit currently available on develop
branch.
When working on documents on a dataframe feature extraction requires a sklearn transformer so that the process can be a part of pipeline and serialized along with it. The issues so far:
str
andList[str]
for inference time.