fbdesignpro / sweetviz

Visualize and compare datasets, target values and associations, with one line of code.
MIT License
2.91k stars 274 forks source link

No visualization for columns 2501+ #77

Closed smiglidigli closed 3 years ago

smiglidigli commented 3 years ago

I have a dataset that contains 3.8k columns and 300 to 900k rows. When I run the report and save to html, only the first 2501 columns are visualized. The rest is represented by empty tiles (attachment). sweetviz

Env: GCP, AI Platform, python 3.7 SweetViz ver: 1.1.2 and 2.0.6 Browsers: Opera 74.0.3911.107, Edge 88.0.705.68

code:

features_per_batch = 2500
batches = len(data_source.columns) // features_per_batch + (len(data_source.columns) % features_per_batch > 0)
start_inclusive = 0
end_exclusive = features_per_batch
for batch in range(batches):
    print(f'preparing batch {batch + 1}')
    report = sv.analyze(source=data_source.iloc[:, start_inclusive:end_exclusive]
                        , **{k: v for k, v in kwargs.items() if k != 'path'})
    report.show_html(kwargs['path'] + f'_{batch + 1}.html')
    start_inclusive = end_exclusive
    end_exclusive += features_per_batch

kwargs contain "path" (output path) and "pairwise_analysis" set to "off".

Unfortunately, I cannot share my dataset, as it`s proprietary.

fbdesignpro commented 3 years ago

Hello @smiglidigli ! Thank you again; I did not think it would be practical to have so many features so I arbitrarily picked 2500 as an upper limit for display players, which caused those extra layers to be hidden. I just pushed version 2.0.7 which should be fixing this (the new limit is 10,000, and I think that's enough at that point!). Let me know if this fixed it or if you are still encountering it after updating. Thank you!