Basic Questions - Githubissues

fair-acc / chart-fx

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

GNU Lesser General Public License v3.0

507 stars 93 forks source link

Basic Questions #26

Closed ennerf closed 5 years ago

ennerf commented 5 years ago

Background

First of all, thanks for writing a JavaFX charting library. I've been using JChart2D for what seems like forever, and I can't wait to finally get rid of the last remaining Swing parts.

I've been working on an application that visualizes streaming sensor/measurement data in real-time. One of the features is a way for non-programmers to create custom charts via XML files where they can specify equations as well as things like default ranges, colors, styles, and so on. For example

<chart title="GYROSCOPE" min_range="-3" max_range="+3" range_policy="expanding">
    <trace label="gyroX" units="rad/s" color="red" style="solid" value="fbk.gyroX" />
    <trace label="gyroY" units="rad/s" color="green" style="solid" value="fbk.gyroY" />
    <trace label="gyroZ" units="rad/s" color="blue" style="solid" value="fbk.gyroZ" />
</chart>

and

<chart title="LATENCY" min_range="0" max_range="0.5" range_policy="expanding">
    <trace label="Round Trip Time (RTT)" units="ms" color="black" style="points" value="(fbk.pcRxTime - fbk.pcTxTime)*1E3" />
    <trace label="Hardware Response Time" units="ms" color="blue" style="points" value="(fbk.hwTxTime - fbk.hwRxTime)*1E3" />
    <trace label="Transmit Time dt" units="ms" color="green" style="points" value="isNaN(prevFbk.pcTxTime) ? 0 : (fbk.pcTxTime - prevFbk.pcTxTime)*1E3" />
</chart>

Unfortunately, I keep running into various basic issues when trying to port the charts over to chart-fx.

Questions / Problems

We typically want to show the last N measurements, or the last (up to N) measurements within some time period. It'd be nice to have a LimitedCircularDoubleDataSet. It sounds very much like #8, so I'd second that as a feature request. Once I'm more familiar with this library I may be able to contribute something.
How can I dynamically set colors for individual datasets? The only reference I've found was in RollingBufferLegacyExample via setting the chart style for .default-color{index}.chart-series-line{...}. Unfortunately, doing that doesn't seem to have any effect.
Similarly, how can I selectively change the style of individual datasets? e.g. render one dataset with only markers (e.g. MATLAB's ., o, x), and another dataset as a solid/dashed/dotted line without markers (e.g. MATLAB's -, --, :). DataSet::getDataStyleMap::put seems to apply only to individual points
How can I render a dashed line? I found some usages of DashPatternStyle in the GridRenderer, but found nothing about using it for individual datasets.
What JDK/JFX version were you using for the performance comparisons in your paper? In particular, I'd be interested to know whether the numbers are before or after the Marlin renderer changes.

Thanks!

Edit: cleaned up the questions a bit

ennerf commented 5 years ago

Also, below is a list of some (probable) bugs I've encountered so far with the XYChart. They are minor, so I don't think it's worth creating new issues. Overall, the library seems quite nice 👍

Zooming in too far (e.g. >10x using the mouse area) crashes the entire runtime in a way that can't be recovered from
Enabling/disabling grid lines does not trigger a redraw. Changes become visible after e.g. zooming in and out
When zooming in, the Y-axis label eventually gets pushed outside of bounds. It may be worth hiding the label if it doesn't fit anymore. This is really nitpicky though and probably not worth the effort :)

normal zoomed-in

StyleParser#41 splits by ; and , which causes issues when using rgb(r,g,b) in CSS, e.g.,

String style = XYChartCss.FILL_COLOR + "=rgb(0,255,0);" // gets parsed as "rgb(0"

The same StyleParser issue also makes it impossible to define a stroke dash array pattern getFloatingDecimalArrayPropertyValue
The JFXtras dependency uses the Java 8 version. Considering that everything else is 11+, this should be updated to the Jigsaw compatible version 9.0-r1

wirew0rm commented 5 years ago

Hello Florian,

first of all, thanks for your interest in our charting framework and also for providing your user story, it's allways nice to see what other people are building.

<chart title="GYROSCOPE" min_range="-3" max_range="+3" range_policy="expanding">
    <trace label="gyroX" units="rad/s" color="red" style="solid" value="fbk.gyroX" />
    <trace label="gyroY" units="rad/s" color="green" style="solid" value="fbk.gyroY" />
    <trace label="gyroZ" units="rad/s" color="blue" style="solid" value="fbk.gyroZ" />
</chart>

This actually looks very close to the fxml code for applications using chartfx. Currently most of our code manually instanciates the Chart components, so there are some attributes that cannot be set in a nice way from fxml, but we are open to improving that. e.g.

<XYChart fx:id="waveletChart">
  <plugins>
    <Zoomer />
    <Panner />
  </plugins>
  <renderers>
    <ContourDataSetRenderer />
  </renderers>
  <datasets>
    ....
  </datasets>
</XYChart>

Would give you a chart with zoom and panning functionality enabled and a renderer to render 3D contour data instead of the default xy data. Your traces would be DataSet implementations which expose all the properties you need to the fxml loader. Of course if you already have your xml format and parsing routines, this might not be a good approach, it just looked a lot like your code and I think we do not have any public fxml examples yet.

Questions / Problems

* We typically want to show the last N measurements, or the last (up to N) measurements within some time period. It'd be nice to have a `LimitedCircularDoubleDataSet`. It sounds very much like #8, so I'd second that as a feature request. Once I'm more familiar with this library I may be able to contribute something.

Did you check the LimitedIndexedTreeDataSet it should provide the functionality you described and for our use cases the performance was sufficient. I would suggest trying that and if it turns out to be a performance bottleneck, you can still switch to a custom DataSet implementation. Alternatively you could use the CircularDoubleDataSet and have some user code or a Plugin adjust your AxisRange to limit the data which is shown.

* How can I dynamically set colors for individual datasets? The only reference I've found was in [RollingBufferLegacyExample](https://github.com/GSI-CS-CO/chart-fx/blob/master/chartfx-samples/src/main/java/de/gsi/chart/samples/legacy/RollingBufferLegacySample.java#L117) via setting the chart style for `.default-color{index}.chart-series-line{...}`. Unfortunately, doing that doesn't seem to have any effect.

* Similarly, how can I selectively change the style of individual datasets? e.g. render one dataset with only markers (e.g. MATLAB's `.`, `o`, `x`), and another dataset as a solid/dashed/dotted line without markers (e.g. MATLAB's `-`, `--`, `:`). `DataSet::getDataStyleMap::put` seems to apply only to individual points

CSS in JavaFX is complicated implementation wise, but in general most functionalities should work. Note that markers are currently allways painted in strokeColor

dataSet.setStyle("strokeColor=pink; fillColor=green");

See de/gsi/chart/XYChartCss.java for all supported properties. The dataSetStyle map, as you correctly observed, allows to highlight single points in your data set by assigning a different style to them.

* How can I render a `dashed` line? I found some usages of `DashPatternStyle` in the `GridRenderer`, but found nothing about using it for individual datasets.

Dashed Lines are currently not supported, because the underlying JavaFX implementation has performance issues (even if you render on a canvas like we do it still allocates a Line Object for each dash). That's why we have our own dashing implementation for the grid renderer which uses solid lines with a dashed fill pattern. But this workaround only works correctly for horizontal and vertical lines.

* What JDK/JFX version were you using for the performance comparisons in your paper? In particular, I'd be interested to know whether the numbers are before or after the [Marlin renderer](https://static.rainfocus.com/oracle/oow17/sess/1493054732447001bY8w/PF/javaone-marlin-talk_1507132140815001B9Fs.pdf) changes.

The plot in the Paper was produced using JDK8, but main development nowadays happens on JDK11 and openJFX 12 and the relative Proportions have not changed significantly and speedup due to marlin was also moderate. I can rerun the performance evaluation overnight and post the results tomorrow.

wirew0rm commented 5 years ago

I also fixed all the issues from your second comment in our dev branch, except for the second one, which I put into the backlog. The dev branch will become 11.1.0 soon.

Again, thanks for the detailed bug reports.

ennerf commented 5 years ago

Thank you for your quick feedback.

FXML

That's good to know. Overall, I really like FXML and CSS. I usually work together with a UI designer, and the split makes it really easy to work with. The more that can be exposed, the better.

LimitedIndexedTreeDataSet

We often measure latency data that comes in at >=1KHz and is usually <=1 ms. The network latency is subject to GC, so I try really hard to reduce allocations and the number of minor GCs (>=3ms) whenever possible to not interfere with the results too much. The LimitedIndexedTreeDataSet adds a lot of extra objects, and the sorting is unnecessary given that the data is already sorted by time.

Writing a plugin is an interesting idea. I'll give that a try to start off with.

Individual Styling

Thanks. I was able to create individual styling using the CSS defines and a separate renderer for each dataset, as shown below. It don't know whether this is the intended usage, but it does work.

// Simplified dummy code
chart.getRenderers().clear();
for(Trace trace : traces) {
    // ...

    DoubleDataSet dataSet = new DoubleDataSet(name, x, y, x.length, false);

    // Define color/width via CSS styling of the data set
    String color = toCssColor(trace.getColor());
    String style = ""
            + XYChartCss.MARKER_COLOR + "=" + color + ";"
            + XYChartCss.STROKE_COLOR + "=" + color + ";"
            + XYChartCss.FILL_COLOR + "=" + color + ";"
            + XYChartCss.STROKE_WIDTH + "=1;";
    if (traceConfig.getStyle() == TraceType.DASHED_LINE) {
        style += XYChartCss.STROKE_DASH_PATTERN + "=" + "5";
    }
    dataSet.setStyle(style);

    // Define line style via renderer
    ErrorDataSetRenderer renderer = new ErrorDataSetRenderer();
    renderer.setDrawBubbles(false);
    renderer.setDrawBars(false);
    renderer.setDrawMarker(false);
    renderer.setPolyLineStyle(LineStyle.NONE);
    renderer.setErrorType(ErrorStyle.NONE);
    switch (type) {
            case SOLID_LINE:
            case DASHED_LINE:
                renderer.setPolyLineStyle(LineStyle.NORMAL);
                break;
            case INDIVIDUAL_POINTS:
                renderer.setDrawMarker(true);
                renderer.setMarker(DefaultMarker.RECTANGLE);
                renderer.setMarkerSize(1);
                break;
        }

    // Add to chart
    traceRenderer.getDatasets().add(dataSet);
    chart.getRenderers().add(traceRenderer);

}

ennerf commented 5 years ago

Btw. after having already ported parts of the application, I have to say that the zooming/panning features are really well done. That'll make a lot of users happy.

Feel free to close this issue unless you want to keep it open for reference.

RalphSteinhagen commented 5 years ago

@ennerf thanks for your interest and co-using our lib.

It don't know whether this is the intended usage, but it does work.

Yes, this is one of the possible/intended usages. Our developers are quite diverse and favour a wide range from FXML/CSS-only up to fully programatic-style approaches. While the latter is (for us) easier to implement, we try to expand the FXML/CSS functionality where possible/reasonable/compatible with our other commitments (N.B. we actually build high-energy particle accelerators, having a charting lib is just one out of many tools for us).

RalphSteinhagen commented 5 years ago

We often measure latency data that comes in at >=1KHz and is usually <=1 ms. The network latency is subject to GC, so I try really hard to reduce allocations and the number of minor GCs (>=3ms) whenever possible to not interfere with the results too much. The LimitedIndexedTreeDataSet adds a lot of extra objects, and the sorting is unnecessary given that the data is already sorted by time.

A bit of rationale behind LimitedIndexedTreeDataSet: we specifically wrote this for a (perhaps) similar use-case where the sorting/tree was necessary due to:

burst-conditions where the data arrives out-of-order (N.B. worker-thread-pool handling net-io)
non-equidistant sampling (either DAQ errors or DAQ operating in triggered burst-mode).

Performance comparisons between a 'linear-list-based DataSets with post-process-sorting and dynamic resizing of the circular buffer queue' and the eventual LimitedIndexedTreeDataSet was similar/more favourable performance/maintainability for the latter under real-world conditions.

If you can assure equidistant sampling and sorted data, then CircularDoubleErrorDataSet might be the better choice for you.

Also, it is perfectly OK (we do this as well) to use the DataSet or DataSetError interfaces directly around your own internal/optimised data structures. For simplifying some of the (potential) boiler-plate code, you may derive -- but this is not strictly necessary -- from AbstractDataSet and/or AbstractErrorDataSet.

Out of engineering curiosity: if latencies <=1 ms and GCs are of concern, why do you use Java/JVM+GC in the first place. Many favour (ie. our real-time use case) either HW-based solutions and/or C++/Linux (+real-time extensions). Do you have documented public examples for your use-case/applications?

N.B. We are working on a twin-project to chart-fx in parallel (working title 'chart-qt' -- very imaginative, isn't it) that aims at a similar API/MVC-paradigm (notably DataSet structure) but implemented for our C++11/Qt environment.

ennerf commented 5 years ago

Engineering Thoughts

I work at HEBI Robotics and we make modular robotic components and systems for robotics R&D as well as non-convential (i.e. not automotive or factory use) industrial applications. Our customer base is quite diverse, so we actually provide bindings for a variety of different languages (C, C++, ROS, Python, MATLAB) as well as different operating systems (Windows, Linux, MacOS) and platforms (x32, x64, ARM). While most APIs are wrappers around a common C library, the MATLAB API is currently entirely based on Java. This is in part because it existed before everything else, and partly because MATLAB's Java interface is a lot easier to deal with for complex interactions.

Maintaining essentially the same functionality in multiple languages has been an interesting experience. While we initially ran into problems related to Java's GC, we eventually re-architected the system in a way that produces zero allocations and thus never triggers any GCs in the first place. Once this issue was taken care of we found that the Java API has been significantly quicker/easier to develop and that the performance actually often even outperforms the native C++ version. Some of this stems from the fact that quicker development left more time for optimizing hot paths, and some of it from our application being well suited for a runtime JIT with optimistic optimizations. For comparison, when running on a simple NUC6I7KYK (mini PC from 2016) the Java internals can handle about 2GB/s of incoming data, which maps to more than >2.5 million packets (>100 million individual sensor measurements) per second. (MATLAB's Java interface does technically generate some garbage, but with some tuning you can get that down to a <3ms minor GC every few hours, or you could use Azul Zing for pauseless GC. For our use cases this really doesn't matter)

Of course not everything can be run on a consumer OS, so we came up with a hybrid solution where all of the hard-real-time tasks are offloaded to an RTOS running on each device/module. As the system is setup now, there is no distinguishable difference between running a (somewhat time sensitive) system like Igor from MATLAB/Java on Windows or from C++ on RT Linux. Overall, we also overengineered the system on purpose because some groups build pretty crazy systems (e.g. NASA's SUPERBall v2) and they may only have limited processing power.

ChartFx would be integrated into Scope which is primarily a monitoring/debugging tool. It doesn't run any real-time control loops, so it's not quite as sensitive to GCs. It would affect displayed latency data, but as long as nothing triggers any major GCs the minor pauses shouldn't be visually noticeable.

Regarding samples: Mobile I/O - ARKit & Magnetometer Demo shows some MATLAB/Scope interactions. Robot Arm - Chasing Moving Target shows a simple arm demo. The phone app is available for free, so feel free to play around with it 👍

LimitedIndexedDataSet

The LimitedIndexedDataSet creates DataAtom objects that are alive until they expire after some arbitrarily long time. This can cause object promotions and may trigger major GCs. For our use case it's ok to limit the total number of values, so it's better to pre-allocate everything. If users really want to show hours of data, they can log to disk and then visualize the result with a static viewer.

We can guarantee sorted data and known timestamps because:

The entire system is deterministic and preserves the sequence number during all computations
Network-level re-ordering isn't really an issue for our use case (see Analyzing the viability of Ethernet and UDP for robot control)
Every packet has a hardware timestamp recorded on the embedded device. The time may not be equidistant, but it is always known and monotonically increasing

Yes, I think the CircularDoubleErrorDataSet or a custom variation of it is probably the way to go.

It's good to know about chart-qt. There are some projects that we are considering Qt for :+1:

RalphSteinhagen commented 5 years ago

@ennerf: thanks for sharing your interesting use-case with us!

Hope above helps you with your original issue. Most of the fixes will be rolled out with the next [8,11].1.0 release.

ennerf commented 5 years ago

@RalphSteinhagen @wirew0rm

I've ported almost everything over, and so far things are looking really good :+1:. I've only encountered a few more issues:

CircularDoubleErrorDataSet has a wrong generic type (DoubleErrorDataSet) which prevents overriding computeLimits().
Added data points are not automatically visible. Are they supposed to be? It works if I call DefaultNumericAxis::requestAxisLayout() or when I manually extend the x range to the latest value.

I didn't see any explicit calls in the RollingBufferSample, so I'm probably missing something. The datasets are children of renderers rather than the chart in case that makes any difference. In my particular use case I'd actually prefer to trigger it manually, but would requestAxisLayout() be the best way to go about it?

The reduction step causes some outliers to toggle visibility and flicker. Setting ((DefaultDataReducer) getRendererDataReducer()).setMinPointPixelDistance(0) helps, but the issue is still there. Disabling the reduction (setMinRequiredReductionSize(maxSamples + 1);) makes the issue disappear.
Every once in a while the entire chart flickers. It looks a bit like the y-axis range produces garbage values for one frame, but it's tough to tell. I'll try to debug this a bit and maybe create a screen-share.

Thanks again for the great work! The performance gains have already been substantial. Rendering a >100MB dataset with JChart2D caused crashes or multiple seconds of delay, but ChartFX displays the whole chart without any perceivable delay.

RalphSteinhagen commented 5 years ago

@ennerf thanks for your very constructive and helpful feedback!

CircularDoubleErrorDataSet has a wrong generic type (DoubleErrorDataSet) which prevents overriding computeLimits().

Fixed in dev-x.1.0. N.B. the internal 'protected void computeLimits()' method became 'public void recomputeLimits()'. This changed because the DataSet contains now also the pertinent axes names und their unit descriptions in addition to the [min, max] ranges. The rationale behind this: multi-dimensional data sets, and since we also use/generate DataSets server-side and ship the fully configured DataSet (including meta-data, error flags, styles, etc.) via the network (ie. byte buffers) to our thin GUI-clients. Alex (@wirew0rm) is working on a new plugin that optionally/automatically updates the axes according to the DataSet's content. Maybe it's worthwhile for you to have a peek into the dev branch (API changed a bit -- hopefully for the better).

Added data points are not automatically visible. Are they supposed to be? It works if I call DefaultNumericAxis::requestAxisLayout() or when I manually extend the x range to the latest value.

Unless the auto-notification of the data set is disabled, this should not be the case. We stream-lined the auto-notification with the introduction of read-write locking primitives a bit. Now the default (library) adders/setters should mute the auto-notification flag during their operation and (depending on if the user hasn't disabled auto-notification for the specific data set) issues a single update event afterwards that eventually should trigger the chart update. See for example here. Let us know if you spot some places where we may have missed this.

[..] I'd actually prefer to trigger it manually, but would requestAxisLayout() be the best way to go about it?

Style-wise, triggering manually is perfectly OK. We have also some use-cases for this. The auto-update is controlled via the setAutoNotification(boolean) flag (earlier located in DataSet, now in EventSource). You can trigger a repaint via JavaFX's void requestLayout() on the chart -- which in turn triggers, the axes, canvas, renderers, plugins, etc. -- or via one of the `AbstractDataSet' derived data sets and issuing a fireInvalidated(new AddedDataEvent(this)). Triggering via the axes is also possible but may be affected by a one update iteration delay since the axes is triggered further down the triggering chain. N.B. since triggers are costly for large or fast-updating data sets, we aim to keep them at a necessary minimum.

The reduction step causes some outliers to toggle visibility and flicker.

Besides tuning the data reduction algorithm (e.g. reducing the min distance to '1' or '0' pixel, what you already seem to have done) you can disable the reducer entirely via 'ErrorDataSetRenderer::setPointReduction(false)'. Albeit, the latter is not recommended for very large data sets since the rendering gains a lot of performance through drawing multiple data points that map on the same pixel only once. The reduction algorithm is very basic: if the new coordinate falls on (or is very close to <-> pixel distance) the pixel coordinate of the previous point, it's dropped. If multiple points are dropped then the next new point with sufficient pixel distance to the previous point is drawn at the average position of the dropped points and the error bars adjusted according to spread to indicate the range the dropped points.

N.B. There are two main point reduction algorithms: the one implemented in ReducingLineRenderer which does not take data set errors into account and the (default) DefaultDataReducer used in the ErrorDataSetRenderer which also takes X and Y error-bars/-bands into account.

I presume this managed to become our longest GH issue thread ever ;-)

ennerf commented 5 years ago

Here is a screen capture of what I mean with the flickering caused by the reduction: https://youtu.be/V6WN-RcqSNE

The minPointPixelDistance is 0. The red circles highlight two of the data points that often disappear. I'd expect the reduction to stay far away from those.

RalphSteinhagen commented 5 years ago

I love that video... is this real-time ... if yes, we need this charting library.

regarding the global flickering: known and issue solved in dev branch. The issue was that the renderer (for performance reasons) did not sufficiently lock the DataSet. We thus have already added (internally) the read-write locks.

regarding the issue (your red circle markings): your observation is a valid point. :thumbsup: For the sake of keeping this issue short (and eventually close) could you repost this as a new issue?

Maybe you could also add a video with the data point reduction disabled. This would help us to identify if it's the data reduction or line/marker rendering part (I presume the plot uses only markers, DataSet or DataSetError derived?).

Thanks in advance.

ennerf commented 5 years ago

The thin clients make an interesting use case. I was already wondering what the use case for BinarySerialiser was. I deal quite a bit with serialization, and I wrote MFL for creating .mat files, as well as a zero-allocation Java library for Protobuf (not open sourced yet, but I'm considering it). Protobuf in particular may be a better option than maintaining a custom binary format with versioning. These things go out of hand pretty quickly.

ennerf commented 5 years ago

I agree that this issue has already gotten way out of hand :)

Could you maybe add Gitter to this repo? It's more like a chat and would make it a lot easier to handle small discussions. I have seen this work quite well for some repos, e.g., gitter-channel for HdrHistogram

ennerf commented 5 years ago

I believe all of the actual issues in this novel have been fixed. Closing.

RalphSteinhagen commented 5 years ago

:smile: @ennerf maybe you could post another video of the final result. We'd love to have some external examples/references (N.B. especially since we are (re-)discussing the topic of UDP transmission internally). Thanks in any case.