RhoInc / web-codebook

A web-based version of the codebook, which generates a concise summary of every variable in a dataset.
https://rhoinc.github.io/web-codebook/test-page/default/
MIT License
11 stars 5 forks source link

Large data sets cause browser to crash #320

Open jwildfire opened 4 years ago

jwildfire commented 4 years ago

Summary

Very large data (~100 mb+) sets cause severe performance issues, and may not render at all.

Details

After the data is loaded, the codebook shouldn't need huge amounts of time to summarize the data and render the page. After doing some basic profiling, I'm 99% sure issues are due to inefficient data handling in various places in the code.

samussiah commented 4 years ago
jwildfire commented 4 years ago
  • makeSummary takes ~60% of load time

    • determineType takes a significant amount of that time. Avoid checking every variable value; rather loop through values until a value identifies the variable's type.

Could also recommend the user provide type for each column and avoid this altogether in large data sets. We could just pass in the R column types in datadigest.

samussiah commented 4 years ago
  • makeSummary takes ~60% of load time

    • determineType takes a significant amount of that time. Avoid checking every variable value; rather loop through values until a value identifies the variable's type.

Could also recommend the user provide type for each column and avoid this altogether in large data sets. We could just pass in the R column types in datadigest.

Having R determine the data types would definitely take some load off the browser.