dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
894 stars 255 forks source link

Consider using hypothesis for testing. #17

Open dsevero opened 6 years ago

dsevero commented 6 years ago

Maybe we should consider using hypothesis for testing. I use it here at work to test ETL and data pipelines and it works like a charm.

TomAugspurger commented 6 years ago

Cool, I've wanted to try it out on a real project for years.

dsevero commented 6 years ago

Great. I'll use it in whatever I try to do next (after the Imputer, which I don't think makes much sense).

Maybe we can start making some estimators?

TomAugspurger commented 6 years ago

Maybe we can start making some estimators?

By all means. Which ones are you interested in working on?

dsevero commented 6 years ago

I think it would be strategic to implement ALS. Spark has gained some spectacular popularity in recommendation systems due to it. There's an implementation of a somewhat similar method in sklearn, but I think it would be wise to go rogue on this one.

I've implemented it before by hand using the multiprocessing lib in python in the past. Should be quite easy to migrate it.

What do you think?

TomAugspurger commented 6 years ago

ALS seems like an ideal candidate.

massich commented 6 years ago

Hi I've also would like to play with hypothesis. How can I help? @daniel-severo can you point to where you had started to play with it?

TomAugspurger commented 6 years ago

I'm not familiar with using hypothesis, so I may be wrong, but I think it'd be more straightforward to apply it to pandas itself, rather than dask-ml. I opened https://github.com/pandas-dev/pandas/issues/17978 to discuss using it in pandas.

On Tue, Oct 24, 2017 at 12:00 PM, Joan Massich notifications@github.com wrote:

Hi I've also would like to play with hypothesis. How can I help?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/17#issuecomment-339060280, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIid9isk5QL9WfjqMBNSxuFbTP7GNks5svhelgaJpZM4Pv6yH .