Closed js3711 closed 6 years ago
Just using df_x
and df_y
works, correct?
some context: dask dataframe doesn't know its own length,so doing df_x.values
results in a dask array with unknown length. We can't concatenate an array of ones to X
in that case, since we don't know how long to make the ones.
I plan to implement something like https://github.com/dask/dask/issues/3090 later today. There's the related https://github.com/dask/dask/issues/3293 issue.
Would it be possible for dask-ml to add a simple intercept term without having to fully compute things? This seems like the sort of thing that should be possible with map_blocks and a custom function. This seems common enough that forcing computation might be considered a usability bug.
On Mon, Jul 30, 2018 at 9:47 AM, Tom Augspurger notifications@github.com wrote:
Just using df_x and df_y works, correct?
some context: dask dataframe doesn't know its own length,so doing df_x.values results in a dask array with unknown length. We can't concatenate an array of ones to X in that case, since we don't know how long to make the ones.
I plan to implement something like dask/dask#3090 https://github.com/dask/dask/issues/3090 later today. There's the related dask/dask#3293 https://github.com/dask/dask/issues/3293 issue.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/325#issuecomment-408931936, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszCPSyrINhlHmxUrZZiBKOcwbUVCVks5uLzixgaJpZM4Vmwy_ .
Yeah, using map_blocks should be sufficient here. I can take a look at that now.
I can perform operations on df_x and df_y but:
lr = LinearRegression(fit_intercept=True)
lr.fit(df_x, df_y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-66-7f17fbaf27c3> in <module>()
1 lr = LinearRegression(fit_intercept=True)
2
----> 3 lr.fit(df_x, df_y)
~/anaconda3/envs/correlation_exploration/lib/python3.6/site-packages/dask_ml/linear_model/glm.py in fit(self, X, y)
151 self : objectj
152 """
--> 153 X = self._check_array(X)
154
155 solver_kwargs = self._get_solver_kwargs()
~/anaconda3/envs/correlation_exploration/lib/python3.6/site-packages/dask_ml/linear_model/glm.py in _check_array(self, X)
167 X = add_intercept(X)
168
--> 169 return check_array(X, accept_unknown_chunks=True)
170
171
~/anaconda3/envs/correlation_exploration/lib/python3.6/site-packages/dask_ml/utils.py in check_array(array, *args, **kwargs)
139 elif isinstance(array, dd.DataFrame):
140 if not accept_dask_dataframe:
--> 141 raise TypeError
142
143 # TODO: sample?
TypeError:
Fixed on master @js3711. Thanks for the report.
Hello,
I am trying to fit a linear regression model from a dask dataframe because my data will not fit into local memory.
This throws:
This throws: