Closed TomAugspurger closed 4 years ago
What is the type of raw_documents? Most collections should support futures as inputs. Bags may not be hip enough yet? The easy solution may be to wrap it in a delayed?
On Fri, Jul 24, 2020 at 2:30 PM Tom Augspurger notifications@github.com wrote:
In [1]: import dask.bag as db In [2]: import dask_ml.feature_extraction.text In [3]: from dask.distributed import Client ...: client = Client()In [4]: vocab = {"foo": 0, "bar": 1} In [6]: remote_vocab, = client.scatter((vocab,), broadcast=True) In [7]: vect = dask_ml.feature_extraction.text.CountVectorizer(vocabulary=remote_vocab) In [8]: bag = db.from_sequence(['foo bar', 'foo', 'bar'], npartitions=2) In [9]: vect.fit_transform(bag)---------------------------------------------------------------------------TypeError Traceback (most recent call last)
in ----> 1 vect.fit_transform(bag) ~/sandbox/dask-ml/dask_ml/feature_extraction/text.py in fit_transform(self, rawdocuments, y) 188 vocabulary = vocabulary.compute() 189--> 190 nfeatures = len(vocabulary) 191 result = raw_documents.map_partitions( 192 _count_vectorizer_transform, vocabulary_for_transform, params TypeError: object of type 'Future' has no len() Just transform works fine.
In [10]: vect.transform(bag)Out[10]: dask.array<from-bag-_count_vectorizer_transform, shape=(nan, 2), dtype=int64, chunksize=(nan, 2), chunktype=scipy.csr_matrix>
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEYSX2QOR4KJEGOC2TR5H4NXANCNFSM4PHCYS7A .
Just
transform
works fine.