bokeh / rbokeh

R interface to Bokeh http://hafen.github.io/rbokeh/
Other
313 stars 67 forks source link

Plot does not draw when both `glyph` and `lname` are specified in `ly_points` #163

Open lianos opened 8 years ago

lianos commented 8 years ago

When both glyph and lname are specified in ly_points, the plot does not draw.

These two calls work:

figure(iris) %>% 
  ly_points(Sepal.Length, Sepal.Width, glyph=Species)
figure(iris) %>% 
  ly_points(Sepal.Length, Sepal.Width, color=Species, lname='ipoints')

This does not:

figure(iris) %>% 
  ly_points(Sepal.Length, Sepal.Width, glyph=Species, lname='ipoints')

I'm trying to debug this, but having a hard time digging deep into the guts of the package. Inside the ly_points function, args$info$lname is still populated correctly after the call to grab. My guess is that something is going awry in the calls to make_glyph inside the for loop at the bottom of ly_points, but I can't get my head in the right place to debug this much deeper.

Hopefully that's enough information to help you identify the problem quickly.

lianos commented 8 years ago

I think I see what's happening. When we map glyph to a categorical value with many levels, these are actually drawn as different layers with their own internally generated unique lname values when lname isn't specified in the ly_points call.

When we specify a value for lname in the ly_points call, each glyph is drawn using the same lname, and (I guess) this causes an explosion.

Trying to hack around this issue, I think this this implementation (many layers for a single ly_point call with a multi-level glyph) is also going to hammer our ability to use callbacks on brushing (for instance).

It seems to me that the data is split into different subsets per glyph level, and we can't reliably use the index returned from the tool_box_select and tool_lasso_select as they refer to different rows of the main data object that is getting sliced and diced per glyph level ... know what I mean?

lianos commented 8 years ago

A possible way to work around the "splitting-data-and-mismatched-index" problem would be to take an approach similar to plotly where the callback returns the values from a specified key column of the data object, instead of the indices into it.

I was thinking to try to take a crack at that by looking at how one might write up custom callback handlers, but ... I'm a bit out of my comfort zone.

If we could do that, though we could match the returned keys to the appropriate rows of the original data object and process the selection that way instead of relying on row numbers.

hafen commented 8 years ago

@lianos this is an interesting issue that I've grappled with quite a bit.

You are correct that mapping glyph to a categorical variable requires multiple layers to be drawn in BokehJS, which is required due to the design of BokehJS. You're also correct that this does wreak havoc on the idea of brushing, etc.

I like the idea of an index column. Ideally you would still be able to specify a single lname and have rbokeh magically provide the correct index of the selected values. I'll look into this.

Does plotly create a key column for you or does it expect you to have a key already defined? I think we could even just add a new variable with the row numbers prior to splitting the data up such that the row numbers of the original data are still preserved, and pass that on with the data so that we can retrieve it when we return the index.

lianos commented 8 years ago

plotly requires you to specify a key column to use that already exists in the source data(.frame).

Take a look at this example, and scroll down to the server.r code. You can define the key in the aes(..., key=<column>) call if the user goes the ggplotly route. If you go the plot_ly-native grammar route, you simply plot_ly(..., key=<column>)

And, sorry for the tangent, but while we're discussing row indices in these callbacks (and since it's still "early days" on this feature), I wonder if it wouldn't make sense to return R-native row numbers from these callbacks?

I've found that I have to + 1 to the row indices the callbacks return, presumably because javascript uses 0-based indexing. It's fine either way, but took me a minute to realize what was going on with that.

hafen commented 8 years ago

Ah yes of course the row numbers should be 1 based. Working on this - I'm balancing this with some work on other issues (mainly updating to Bokeh 0.12) but hopefully it will come out soon. I need to get some better docs out on the callback stuff to make it easier to dive in since it seems to be a desired feature.