Working Notes / Docs - Githubissues

joinr commented 5 years ago

I started working through the implementation and use of the current library and API, a bit more in depth than early forays. It's an annotated commentary of what I'm seeing as I go through, gaining a conceptual understanding of the understandably active library. I'm also trying to increase comprehension so that I can extend my own custom plots, and contribute. Ultimately, just use the library and be useful. It's currently sparse since I just started; as I go I'll spew prose and pseudo-documentation.

https://gist.github.com/joinr/96bbc5614f36bc3fba46fda7843df43d

This may never go beyond my own use, but if anything useful about documenting the innards falls out (assuming the content is correct), I can contribute docstrings / commentary.

Again, great job and breadth of work on this @genmeblog, and great input/feature driving @zcaudate.

zcaudate commented 5 years ago

Thanks @joinr, this is great. I’ve looked at the code myself and it’s a little bit beyond me right now.

I’m really interested in learning how to extend ‘render-graph’ and designing a custom plot. If you or @genmeblog know how best to convey that information, it would be most welcomed.

genmeblog commented 5 years ago

This is great @joinr I appreciate this kind of documentation because it brings different angle/perspective. Also this will help me to fix the holes in design. I'll be happy to reuse your writings in final documentation.

Let me enhance or add my comments:

graph-canvas creates clojure2d.core/canvas which is wrapped BufferedImage. Aside orientation you can also request:

quality, :rendering-hint (:low, :mid, :high and :highest). Quality is similar to smooth function in Processing/Quil. What is actually set up is here: https://github.com/Clojure2D/clojure2d/blob/v1.1.0/src/clojure2d/core.clj#L485 The only difference between :high and :highest is RenderingHints/KEY_STROKE_CONTROL set to RenderingHints/VALUE_STROKE_PURE. :highest enables drawing small circles looking as circles not as eggs but on the other side lines in some cases would be thick and blurry. When you look at implementation of scatter plot, :highest is selected when circle marker is used.
:oversize - renderer requests to draw on canvas with desired size. But sometimes chart itself draw bigger markers. To avoid clipping actual canvas is bigger with added margin. :oversize controls how many pixels will be added to each dimension.
:anchor - chart anchor point for position translation
:shift - margin translation done before drawing

The last two regulates how to place rendered chart on the final canvas. :anchor is adjusted for side plots and for axes. However axes do not use graph-canvas because they are rendered a little bit differently.

Few words about orientation. It is enabler for side charts. Rendering function injects proper orientation and that's all. Unfortunately this also brings some issues. Orientations are achieved by flipping and rotation canvas, this is ok unless you want to put any text: https://clojure2d.github.io/clojure2d/docs/codox/clojure2d.core.html#var-orient-canvas In this case plot should take care and translate back to original orientation (see scatter plot). Not every chart is tested for this case. Also, labels suffer and drawing them require some tricks.

do-graph do everything needed to start rendering: creates canvas with required size, quality, orientation, oversize also builds graphical context. Canvas itself is bound to c symbol.

joinr commented 5 years ago

@zcaudate I was interested in the very same thing. Thankfully, from what I'm deriving (and hopefully getting corrected on via @genmeblog ), you've got several hooks throughout the process. In particular, render-graph looks like the main focal point - just defining a new method implementation for your chart type "may" be sufficient. I think the defaults for everything else in the pipeline for defining series/layers have sane defaults (e.g. do nothing with the data, compute basic extents). One short-term goal is to pull this thread (build a custom series type) to explore the low-level process to do that and gain understanding of the overall design.

The other end-goal, would be to highlight potential paths for a "porcelain" API, e.g. a single function call that sufficiently wraps the common chart types (thinking something like Incanter's API for charts, or similar libs). This is on the long-term road-map, and I think having 3rd parties crawl through the generous work that's already been done will help flesh out (and concurrently support) the development effort.

@genmeblog I will lift your comments directly, thanks a lot. I also hope to use this as a stimulus that optimizes documentation / understanding, maybe even some design, WITHOUT forcing you to document everything. In other words, where I/we are trying to understand functionality along the critical path, I/we can implicitly poll for scoped feedback.

As a side-effect, I think that the sooner a broad understanding of how to extend the library emerges, interested parties can help contribute more actively (where feasible). Adding new plot types, working examples, etc. would be one area. I'd personally like to look at a portability layer, as well as declarative rendering to ease some of the plot descriptions (again where it makes sense). There are some simple extensions that we got used to at work, when using incanter and jfree for exploratory analysis and/or building ad hoc presentations. Trivial stuff, like resizing the plot dynamically, copy/paste, making changes to colors, axes, etc. I think these can be addressed in parallel with the mainline effort, without having an effect on the design. I've got a trove of prior work (as we all likely do lol) that can probably be applied with minimal effort, if I can understand the current design and related concepts.

Also, the quick work that @zcaudate did in wrapping the implementation in javafx is worth mentioning, if not integrating (or publishing as a simple wrapper).

genmeblog commented 5 years ago

About custom charts.

The call flow goes like that: prepare-data -> data-extent -> render-graph. And you need to implement this three multimethods. postprocess-data is not used now and nothing implements it. I wanted to seperate as much steps as possible but when I practice more I see that postprocess-data is not needed (yet).

I was also thinking that data preparation and rendering would be beneficial in interactive charts. But the true is: most interactions change data and all steps should be repeated to redraw chart. I don't think that with this architecture and design sofisticated optimizations are possible.

I prepared one interactive example and it's fast enough https://github.com/generateme/cljplot/blob/master/sketches/lattice.clj#L821

Deeper insight into building/rendering process:

prepare-data is called first, as @joinr observed, the main goal is to have preprocess data, like sampling function for line plot, preparing historams, even precalculate extents or whatever. What is important. During this step scales are not established, also you don't know size of target canvas. You know only data given by user and merged chart configuration.
data-extent - data already are preprocessed so you have access to it. What should be returned is map of extents. Each extent is vector containing type and actual extent. Like [:numerical [-1.0 2.2]]. Keys :x and :y are special, they are used to automatically create scales. You can add also the other extents if you need them later. This is the case for :stack-horizontal and :stack-vertical where :inner extent is added. You should be aware of merging and extending extents during building process.
render-graph is actual chart drawing. When you look at implementation you will see that similar patterns repeats. You have an access to canvas dimensions, automatic scales (x and y) and merged extents.

This is how it behaves now. What is planned:

emit-legend - every chart will be able to emit legend information: marker and text. This will be taken from data and/or configuration
register-configuration - currently whole default configuration is created in cljplot.config namespace. This is problematic and doesn't allow to add own charts fully.

joinr commented 5 years ago

As I continue to read/explore, the thought arises that there's probably a really useful case for introducing specs into this, at least for chart configuration. I'm not a huge spec user, but started integrating it where it made sense (data validation, some limited generative testing). There are some cool libraries that help paper over some of the pain points, particularly in specing generic maps (metosin has a great one). Having some loose specs may also help in the documentation process and provide discoverability for folks crawling the library. I'd be happy to propose some as well as I go through this.

One other cool thing regarding having a specification for the chart configuration, etc., is that (like vega), since most everything is fairly data-driven (vectors of stuff being fed to a multimethod that interepets how to build the plot), you can mimick what the vega folks are doing and have generative plots (or "near" plots) for the same dataset. For exploration, one could envision generating multiple plot configurations from the spec and the input data, allowing the user to compare/contrast/select from the one that "looks best." Just a thought.

genmeblog commented 5 years ago

@joinr I'm open to every contribution. I'm very happy that at so early stage I'm getting coments and feedback. My plan is to finish this development round which contains:

auto legends and gradient/palette axes
cleaning configuration - to allow multi inheritance (currently aliasing is possible) and registering own configuration
review of chart data, histograms differs for example from others (histogram was first chart implemented)
histogram should recognize continuous vs discrete data and behave accordingly
categorical vs categorical plots

After this I'll be happy to see more developers (so if you are interested in contributing, I'm looking for it)

genmeblog commented 5 years ago

Great ideas @joinr I'm also not a spec person, so if you could help here it would be great.

joinr commented 5 years ago

@genmeblog Thanks for the substantive insight on the rendering process, particularly the expectation of data types relative to the inputs. That kind of stuff is useful for specifying (perhaps simultaneously spec'ing) the source. Good stuff.

But the true is: most interactions change data and all steps should be repeated to redraw chart. I don't think that with this architecture and design sofisticated optimizations are possible.

I think there's definitely a case for deferred rendering at some point; or decoupling the work done in render-graph from the prepared data. Rather than recompute everything on every change, declaring incremental updates, to include incremental rendering optimizations (via bounding volumes / dirty rectangles, etc.) is a logical step. As you'd said, this requires additional thought into caching, retaining some state, etc. I think though, that the hooks you've provided leave the door open to introducing these concepts though. As a baseline though, sitting on top of something that's already exploiting the hardware (like quil, javafx, wegbl, etc.) may make the "re-render everything on change" compelling enough for a broad set of use cases. I know that - in my experience - Java2D takes some finesse to eek out really good performance. On the other hand, it also can produce very pretty/accurate results; so for static plots it's a good thing.

genmeblog commented 5 years ago

I can't imagine now how to cache elements now but I hope sooner or later we can figure something out :) What should be achievable now:

window resizing require render step only
mapping mouse positions to values (even with some rendered overlays) is "ease" to achieve. Additional data should be returned from rendering functions (position -> data). Fortunately scales have reverse function done already (mapping [0-1] to domain).

I think also about kind of animations with fixed scales and domains. This way only main chart could be repainted.

Java2d uses hardware acceleration but I have no idea what exactly and how it works.

joinr commented 5 years ago

Simple resizing just needs to be hooked up to the frame's event listener (I think it's Window events, there's some on-resize stuff, I did this a while back for spork.sketch and saw it in Jfree as well), but that's just incidental plumbing (e.g. changes to the show-image function).

Mapping mouse positions to values sounds potentially trickier (depending on dataset, layers, and performance considerations). The naive means of blasting through geometry and testing for intersection/containment with the mouse ray probably works to an extent. Using some accelerated structures (e.g. bounding volumes and the like) would be the way to go (and incidentally a first step toward deferred rendering anyway...e.g. Piccolo2D and Javafx scene graph implementations). Note: we have off-the-shelf implementations for this (clj,cljs) via thi.ng.geom. I started exploring using this with quil, managing my scene objects in thi.ing, etc. Seemed like a high-quality implementation.

Java2D does have direct3d/opengl acceleration in the pipeline. The problem typically crops up during compositing/blending operations, which I believe happen in software or are minimally accelerated. So you run into performance walls if you're playing with faded shapes or even sprites. Piccolo2D did some interesting hacks to get around this (primarily leveraging deferred rendering and incremental repainting to minimize draw culls and maximize culling), but performance in e.g. opengl or javafx (built on top of a hardware pipeline) tends to be way better.

zcaudate commented 5 years ago

I feel like adding interactivity may take away from the plotting. Also any amount of work done for interactivity will not be able to upstage the existing libraries in js land.

Also, if it’s interactive, most people would then want it to be in 3d.

zcaudate commented 5 years ago

For me, I’d definitely like to contribute but I have to understand how to do that first - it has to start at the clojure2d level and how cljplot uses the library to draw what is needed to be drawn.

zcaudate commented 5 years ago

also @joinr, @genmeblog, I might be wrong but I think cljplot is a natural replacement for incantor. It already offers a lot more than incantor in terms of graphs. I think it might be worthwhile to see what the incantor people and users of incantor think of the library.

joinr commented 5 years ago

@zcaudate Yeah, I think you meant incanter.charts, which is JfreeChart based. That's one of my side-goals (has been for some time), to replace the visualization component (actually tried to flesh out what was there with JfreeChart, but that turned into a quagmire). I originally was going to use vega (e.g. oz) inside a javafx webview. Did some initial experiments, but hated not having direct access to the plot, plus going through all the junk that vega did to "compile" the end result; very much lacking control in some areas. That being said, I do like the simplicity of the high-level API incanter.charts provides; it could be something to consider when aiming for a single-function-call API down the road (as opposed to the current lower-order chart composition workflow).

Regarding javascript competition on the interactive front: I don't think JS has a lock on anything, not to the point where the final word has been said. Were JS-based avenues sufficient (as proposed by oz/saite, and gobs of other webgl/canvas-backed stuff), then this library wouldn't have legs. Having production quality plotting for publication (seems to be the current short-term goal), plus interactive data analysis visualizations (farther out), leading to attractive animated interactive visuals (who knows when) on the client side, not boxed up in the browser, is very attractive to me. I also think it's entirely possible to provide a competitive clojure-friendly solution for the plotting/visualization space without having to span that weird JS gap with cljs.

Oz/Saite get there - to a degree - but they are subject to what the underlying grammar dictates. With cljplot, I think the opportunity to fundamentally extend and exert control (very lispy) is retained. Rather than fighting with the library (like fought against JfreeChart to implement stuff), it looks like @genmeblog has mapped out plenty of hooks and low-level controls to allow high flexibility, while still providing the grammar-like aspects (and other higher order APIs in the future) of vega/ggplot, etc.

zcaudate commented 5 years ago

I just think it’s a lot of work considering vega is built on top of d3 which makes use of the dom. There’s a lot of abstraction that may be missing in order to do it properly.

This is where interactivity is concerned.

joinr commented 5 years ago

:) I don't often here DOM and proper used in the same context. For what it's worth, vega side-steps the DOM in many critical ways, including using its own scene graph and event propagation model, which is why they can target multiple backends (SVG, canvas, and [soonish] WebGL) without paying a performance tax mucking with the DOM explicitly.

Those abstractions already exist in both javafx and piccolo2d (swing), and in both cases are fairly well optimized for performance and API design. I think adapting the rendering plane (to include currently non-existent event propogation amongst visual elements) would be less effort than you anticipate, but I've been working with javafx and its predecessor for a bit, maybe that's just my positive experience.

zcaudate commented 5 years ago

Hmmm. I’m definitely a user in this case. It would be absolutely awesome.

I like javafx, it’s got good 3d support and everything. But yeah, i’ve realised that BufferedImage is one of the most flexible image abstractions ever designed so I’m not too fussed about having that as a baseline.

joinr commented 5 years ago

@zcaudate Totally agreed with the current basis / implementation strategy. As you've demonstrated, operations on graphics2d wrappers around the buffer (ala CLojure2d), also work seamlessly with javafx via the graphics2d fx canvas wrapper. The next step up (along say targeting javafx directly, vs. the effectively opaque canvas) would be to emit javafx scene nodes relative to the drawing commands. I did something similar with piccolo2d, coming from almost the exact same basis (rendering to a graphics2d clojure wrapper, via some primitive draw call protocols, similar to clojure2d, but with an emphasis on declarative rendering built on top of the imperative protocols). It's feasible to change the rendering layer (not unlike Batik does to emit svg from java2d draw commands), to then emit nodes in your target scene graph context. So, the typical rendering calls for shapes and primitives are compiled into retained imagery in the scene graph node context (typically very similar if not identical geometric primitives, although there could be 2d/3d distinction).

I ended up being able to go from a declarative renderer with a minimal scene-graph implementation, to emitting the same structure to Piccolo2d. I'll try to push an example at some point (cleaning up my old libs and opening them for posterity, as well as insight in the current context). Then piccolo2D takes care of efficiently rendering the scene, handling transforms, and baking in interaction on each node / layer. All the stuff I was breaking my back to get fast in Java2D just kinda melted, plus I got efficient picking, selection, etc. for all the scene elements. I started porting this same API to javafx back in 2014/15 (very very similar to Piccolo2d, with more extras), but lost focus/need related to other work stuff...Also doing this in Clojure vs. java ended up being very joyful.

For now, the chart configuration, data processing, and related APIs are plenty fine to focus on (e.g. get the content, presentation, and chart libraries matured). Rendering options are an optional side-project, along with related selection stuff. The "meat" is in rounding out presentation quality plots and shaking down the nasty corner cases for chart composition.

zcaudate commented 5 years ago

That would be really cool. For me, I’m learning a lot right now.

I’d also like to learn how draw things with clojure2d in general. Its pretty magic to me right now.

genmeblog commented 5 years ago

@zcaudate clojure2d continues Processing/Quil, openFrameworks approach. I've prepared bunch of examples and I hope I covered almost every aspect of the library here: https://github.com/Clojure2D/clojure2d-examples

zcaudate commented 5 years ago

@genmeblog ah. I remember seeing those. thanks.

zcaudate commented 5 years ago

@joinr, I'm curious. are you planning to go ahead with this?

joinr commented 5 years ago

I have an unbounded timeline. Family and work take precedence. Yes, I am going to. Went thru clojure2d examples last week to grok rendering (basically quil with jvm non portable e.g simpler focus, decent performance). Feel free to copy and paste if you're working on a different time horizon.

zcaudate commented 5 years ago

For sure. Priorities matter.

It'd be good if there was some sort of a checklist of things to do. That way, we might be able to recruit some help if people are interested.

joinr commented 5 years ago

Updated workthrough now lives here. My plan is to work through the examples one-by-one, gleaning insight about the architecture, generating more examples implicitly, and deconstructing things to peek under the hood. Along the way, I'm mapping out the data interfaces, and collating them for spec purposes.

Funny enough, this already yielded substantial fruit. I worked through the first example, the cljplot logo scatter plot, and immediately hit a wall because of some deep errors. So, I had to peel that back, deconstruct to a minimal scatter plot spec, understand that, then accrete features until I hit failure. From there, I was able to determine the cause and push a pull request #15 . I also found some really nice functional approaches to building up plot specifications that just seem natural; IMO cljplot is addressing the grievances I had with other libs, primarily in the amount of openness and control you get with plain clojure functional constructs.

first example

generateme / cljplot

Working Notes / Docs #14