ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
http://ddf.io
Apache License 2.0
168 stars 42 forks source link

Refactor and optimize sampling by size #349

Closed nhanitvn closed 8 years ago

nhanitvn commented 8 years ago

Description and related tickets, documents

Reviewers: @hai-adatao @Huandao0812 @phvu

Breaking changes & backward compatible issues

How to test

Need to be deployed on a cluster with big enough datasets such as NYC taxi to test for the error on spark.driver.maxResultSize @thxph

PR Progress

Make sure all checkboxes below are checked before merged