delta-rho / trelliscope

Detailed Visualization of Large Complex Data in R
Other
115 stars 34 forks source link

Compatibility with foreach / dopar #153

Open MPTIngley opened 7 years ago

MPTIngley commented 7 years ago

Hi Ryan,

I believe I'm having compatibility issues between trelliscope and foreach/dopar.

Goal: Fit univariate regressions using each of N predictors over a number of data subsets. Make plots of the data and fits, with cognostics, using trelliscope.

Approach: Foreach / dopar over the list of covariate names. Specify the plotting and the cognostic functions, and call makeDisplay, within the parallel loop.

Issue: all of the required files and folders are created within /displays/common, but displays/_displayList.Rdata is not complete: the number of rows within the displayListDF variable of _displayList.Rdata is smaller than the number of folders created within /displays/common. Opening the viewer via view() only gives access to the displays listed in displayListDF.

The missingness in displayListDF seems to be random: rerunning the same code does not always lead to the same missing values with respect to the folders in /displays/common. There is no error when running the loop in serial.

Any thoughts? Thanks - MPTingley

hafen commented 7 years ago

Interesting. If you aren't using datadr for computation or data structures anyway, I'd highly recommend the new trelliscopejs package which will ultimately replace the trelliscope package. There are many benefits to using this package. First, it is more natural to specify cognostics (cognostics are simply columns of the data frame you pass in). Also, the viewer is a pure JavaScript app so the resulting display is much more portable (and you'll find the interactivity is faster too). Also, this will be the version that is getting the majority of attention moving forward, such that bug fixes and help dealing with issues will probably receive more prompt attention. Take a look at the docs and see if it looks like this will fit into your workflow without too much hassle (and if it doesn't and you have ideas, please let me know). I'm actually pushing an update to the docs in maybe a half hour or so that should be more helpful than what's there now.

hafen commented 7 years ago

Actually scratch docs updates getting done right now - maybe tomorrow. What's there is still useful though.

MPTIngley commented 7 years ago

Ryan,

Thanks for the info. As some background, I came across datadr/treslliscope when Bill Cleveland gave a talk at the Aussie stats meetings in Dec, and immediately dove in: it's been just awesome. I do stats on natural perils and claims for a major insurance company, and the datadr framework is dramatically simplifying some of my workflow.

With regards the trelliscope/foreach issue, the data is all in ddf/ddo (local disk connection for now, but likely scaling up to Hadoop soon), and all the compute is making extensive use of the datadr. I have policies/claims for N events, and basically am doing EDA by event, testing out possible predictor variables, etc -- so some very natural data divisions. Given the local compute, it is faster to loop across the potential predictor variables in parallel, using one core for each call to makeDisplay, then to run the loop in serial but give makeDisplay more cores via control. This may be a pathological situation.

trelliscopejs: thanks for the tip, and once I, for the upteenth time, figure out how to bypass all the corporate proxy settings, I'll explore it. I like what you are doing with the cognostics, as in my own work I've found myself calculating them all outside of the call to cog, so that cog itself is just picking out numbers from part of a ddo. I want to use model outcomes (e.g., regression parameters) as cognostics, without having to fit models twice --once in the plotting function and once for the cognostics.

Are there plans to integrate the new trelliscopejs with data dr?

Thanks for your help, and let me emphasize: these packages are fantastic.

-Martin

On Mon, Jan 16, 2017 at 3:30 PM, hafen notifications@github.com wrote:

Interesting. If you aren't using datadr for computation or data structures anyway, I'd highly recommend the new trelliscopejs https://github.com/hafen/trelliscopejs package which will ultimately replace the trelliscope package. There are many benefits to using this package. First, it is more natural to specify cognostics (cognostics are simply columns of the data frame you pass in). Also, the viewer is a pure JavaScript app so the resulting display is much more portable (and you'll find the interactivity is faster too). Also, this will be the version that is getting the majority of attention moving forward, such that bug fixes and help dealing with issues will probably receive more prompt attention. Take a look at the docs https://hafen.github.io/trelliscopejs/ and see if it looks like this will fit into your workflow without too much hassle (and if it doesn't and you have ideas, please let me know). I'm actually pushing an update to the docs in maybe a half hour or so that should be more helpful than what's there now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/delta-rho/trelliscope/issues/153#issuecomment-272771683, or mute the thread https://github.com/notifications/unsubscribe-auth/AGqiwfvgkvQSVUUiY3KkAyRFkIDdNtUhks5rSvJhgaJpZM4LkKqA .

hafen commented 7 years ago

Sorry for the delay and thanks for the kind words. I do indeed plan to integrate trelliscopejs with datadr, hopefully soon. I will keep you posted.