Beirnaert / speaq

Apache License 2.0
9 stars 4 forks source link

Feature Request - Extend/Rebase plotting routines on ggplot/ggplotly/dygraph #1

Open mjmg opened 7 years ago

mjmg commented 7 years ago

Basing the plotting routine to return a ggplot object would allow someone with minimal additional coding to extend the plot to a ggplotly object which can enable interactive zoom in or out of the spectra to check for alignment performance.

bryanhanson commented 7 years ago

I would rig-up some performance tests ahead of time, before coding such a thing. NMR data sets can be massive and ggplot2 is not known for speed, at least in the past. I have a package which uses Javascript to do something similar and once the web page hits a certain size, you can expect it to grind to a halt.

mjmg commented 7 years ago

@bryanhanson Noted on this.

Can you also add dygraphs https://rstudio.github.io/dygraphs/ on your evaluation when you have time? It might be more lightweight for NMR datasets.

Thank You.

Beirnaert commented 7 years ago

@mjmg Thanks for suggesting. However as @bryanhanson noted, the reason I don't use ggplot for plotting the spectra (although I would want to) is that it quickly becomes terribly slow with NMR datasets. For small parts of the spectra however ggplot does work (see the ROIplot function where ggplot is used to plot the spectra and you do have the option to output the individual ggplot objects instead of the plot itself)

Beirnaert commented 7 years ago

@mjmg I will have a look into the dygraphs. I wasn't aware of it's existence yet.

mjmg commented 6 years ago

Plotly issues for plotting large datasets are tracked here for reference: https://github.com/ropensci/plotly/issues/1104

the bottleneck appears to be on the htmlwidgets/jsonlite side, not in plotly_build()

As for ggplot with NMR sized datasets, I have no idea.

bigvis package from hadley seems promising. https://github.com/hadley/bigvis The bigvis package provides tools for exploratory data analysis of large datasets (10-100 million obs). The aim is to have most operations take less than 5 seconds on commodity hardware, even for 100,000,000 data points.

From https://www.r-bloggers.com/visualize-large-data-sets-with-the-bigvis-package/ The basic idea of the package is to use aggregation and smoothing techniques on big data sets before plotting, to create visualizations that give meaningful insights into the data, and that can be computed quickly and rendered efficiently using R's standard graphics engine. Despite the large data sets involved, the visualization functions in the pacakge are fast, because the "bin-summarise-smooth" cycle is performed directly in C++, direcly on the R object stored in memory.

mjmg commented 6 years ago

I just tried loading and plotting a 270K points dataset in dygraph, performance seems ok when zooming in and out. Saving to an HTML file and loading it is another matter.

How many points in an NMR dataset do you usually work with?

Beirnaert commented 6 years ago

Relatively limited datasets are in the range of 300K - 800K data points (20 - 50 samples, 10k - 30k measurement points per sample) dependent on the number of samples and measurement frequency. Larger datasets can easily approach the 10M points, for example MTBLS1 in the MetaboLights database