fbdesignpro / sweetviz

Visualize and compare datasets, target values and associations, with one line of code.
MIT License
2.89k stars 273 forks source link

support big data use cases #75

Open yair4Data opened 3 years ago

yair4Data commented 3 years ago

great library ! it will be great if there was support on big data use cases (integration with dask/ vaex/spark) my use case has out of memory data set size and great imbalace so if i want to keep original target ratio - i need to support original data size and not down sample the data.

fbdesignpro commented 3 years ago

Hello @yair4Data, thank you for the kind words! I hope the library can be useful to you!

Are you saying you are running out of memory one converting to a pandas data frame (e.g. df = df.compute() in dask)?

Or are you getting an error message when running the report, or generating HTML?

haiyuni commented 3 years ago

t have the same probleam too,my data have about billion rows, but it does not work! can use the modin package?

fbdesignpro commented 3 years ago

@haiyuni I haven't looked at modin, I will do so and get back here.

Regarding the billion row issue, I am assuming you are referring to the scale issue (#73)? Or is there a specific error I should be looking at?

Thanks again!