KarthikRIyer / swiftplot

Swift library for Data Visualization :bar_chart:
Apache License 2.0
401 stars 39 forks source link

Histogram drawData Rewrite #62

Closed odmir closed 4 years ago

odmir commented 4 years ago

Rewrite the drawData implementation so that it uses render.drawPlotLines instead of render.drawLine and to fix unhandled edge cases the algorithm used to have.

About the Test

I added a new test function named testHistogramStackedStepLineJoins so I changed the already existing one to testHistogramStackedStepOffset, as well as the saved file name, which matches better both my test and the existing one. This new function is in the histogram-stacked-step.swift file, I think it makes sense to be there, but I don't know if there is a better place for it. Also, I just took the naming of the file from the test above and incremented the number and appended _line_joins to it, I'd like to know if this is the desired way to name the file. I had to do some other numbering changes on file names.

About the Rewrite

I hope the comments and the code explain the behaviour clearly enough, but I know the last part won't make any sense without context, so I will do my best to explain it here.

In the case where histogramType == .step I accumulate the scaledBinFrequency from one series to the next into an array of arrays of histogram bar heights. At the end of this stage we will have allSeries.count + 1 arrays of heights, the first being the array with every height = 0. Hopefully this is not too hard to understand from reading the code.

After this step we need to draw the lines that make up the outline of the histogram, but just the right line segments for each series, or we will have lines behind other lines which might, in some cases, lead to the color from the hidden line "bleeding" through the edges of the line hiding it.

I don't have any background with "subtraction" of lines and algorithms to do this kind of thing, after searching I couldn't really find anything useful, probably because I don't know the right terms, but I came up with my own way.

My Solution

The idea was then to look at two adjacent series and list all the possible transitions between two columns of the histogram. For this we need to consider four heights, two for each series.

These are the 20 possible transitions between two adjacent columns: ____ I ; __ t I The blue lines represent the series that is in the foreground and the yellow ones is the series that is behind. Notice that the yellow horizontal lines are always above or at the same height as the blue ones, since they are derived from them, the only possible difference is an increase in their heights (but never a decrease).

The test case I added is actually testing for all of these cases plus two extra ones, one column at the start of the histogram and another at the end to make sure they behave correctly being the first or last columns of the histogram.

For each case I also derived the necessary "instructions" to accomplish the correct behaviour, for example:

  1. Append height 1 of series 1
  2. End line
  3. Append height 2 of series 2
  4. Append height 2 of series 1

Assuming height 1 is the height of the left column and height 2 the right one, and series 1 the background series (yellow line) and series 2 the foreground series (blue line), we just the drew the following case correctly: ____ I ; __ t I -> -

We need to detect the different cases based on these four heights, I decided to do 4 checks: c1. Is Column 1 Series 1 higher than Column 1 Series 2? c2. Is Column 2 Series 1 higher than Column 2 Series 2? c3. Is Column 1 Series 1 higher than Column 2 Series 2? c4. Is Column 2 Series 1 higher than Column 1 Series 2?

For the above example it would be: c1 = true, c2 = true, c3 = false, c4 = true

Then I just did a truth table where these checks are the inputs and each instruction is an output. Using Karnaugh maps for each instruction I got which checks to && and || together and that is what you see at the end of the code.

Lines 335 through 338 are the checks described above and lines 340 through 346 are the conditions and instructions. Instructions are in this order (height (H), series (S)):

  1. Append H1 S1
  2. End line here
  3. Append H1 S2
  4. End line here
  5. Append H2 S2
  6. Append H2 S1
  7. End line here

Overall the code got shorter and hopefully easier to reason about while also handling all cases (I think) and using drawPlotLines instead of drawLine, this is the before and after of the test: Screenshot 2019-12-08 at 18 14 46Screenshot 2019-12-08 at 18 15 02

Note: I think there is a bug in the code that calculates the frequency of different bins, I couldn't choose the right amount of bins for the data I had to fabricate the test, I always had to choose more bins than necessary and it seems to make some empty bins at the end.

Questions

Now I have some questions for @KarthikRIyer, @karwa and anyone else involved too.

  1. Is the test and the changes to the tests ok as they are now?
  2. Should we rename drawPlotLines to something less specific like drawLines or something else?
  3. Swift's documentation on RandomAccessCollection says that the indices property may hold a strong reference to the collection, but in the case of array it returns a Range, it should be ok to use it in a loop where I use indices to mutate the collection and there should be no copy of the underlying collection right? It shouldn't matter that much anyways I guess, since in the case I used it for it would just copy an array of Rects or an array of Floats each with around binCount elements, which should be pretty small.
  4. I used removeFirst() with Slices. The documentation says that if Self is its own SubSequence than this operation is O(1), that means that in the cases where I used this method I'm not really triggering copies, right? Also another case where it probably is not the end of the world if there are extra copies happening.
  5. And finally: Are there any questions, performance lessons you want to teach me, things you want me to change, is there another algorithm better suited that you'd like to suggest?

Hopefully I didn't forget anything 👍

odmir commented 4 years ago

Ok, of course I forgot something... I couldn't get AGG to work on my system for some reason so I have been working with just CG and SVG enabled. What this means is that I am not able to generate new reference images for AGG and take care of the renamed images. Can someone please run the tests and replace the reference images in AGG?

karwa commented 4 years ago

Ok, of course I forgot something... I couldn't get AGG to work on my system for some reason so I have been working with just CG and SVG enabled. What this means is that I am not able to generate new reference images for AGG and take care of the renamed images. Can someone please run the tests and replace the reference images in AGG?

What's the problem with AGG? CG means you're using a Mac, and it works fine for me here (Catalina, Xcode 11.2.1). You need to run brew install freetype, but that's it.

karwa commented 4 years ago

Thanks for taking this on! This looks really good to me. My only comment is that I wish we could think of some clearer names to make the algorithm more obvious. Failing that, there's no shame in a little ASCII art explainer in the source comments. Code is read more often than it is written, after all.

Also - have you tested this with 3+ stacked series? Could you maybe add a test and reference image for that?

odmir commented 4 years ago

Ok, of course I forgot something... I couldn't get AGG to work on my system for some reason so I have been working with just CG and SVG enabled. What this means is that I am not able to generate new reference images for AGG and take care of the renamed images. Can someone please run the tests and replace the reference images in AGG?

What's the problem with AGG? CG means you're using a Mac, and it works fine for me here (Catalina, Xcode 11.2.1). You need to run brew install freetype, but that's it.

Ok I'm at a loss... I re-enabled AGG and tried swift build, it failed, I then did swift test, magically it worked somehow, then I did swift package clean followed by swift test and it doesn't work again... Am I missing something? I get a bunch of warnings and an error saying it coudn't find gpc.h:

While building module 'AGG' imported from .../swiftplot/Sources/AGGRenderer/CPPAGGRenderer/CPPAGGRenderer.cpp:6:
In file included from <module-includes>:122:
.../swiftplot/Sources/AGG/include/agg_conv_gpc.h:33:10: fatal error: 'gpc.h' file not found
#include "gpc.h"
         ^~~~~~~
karwa commented 4 years ago

That seems to happen with some newer Swift toolchains. The fix is fortunately very simple: you can just comment out that include line.

odmir commented 4 years ago

Thanks for taking this on! This looks really good to me. My only comment is that I wish we could think of some clearer names to make the algorithm more obvious. Failing that, there's no shame in a little ASCII art explainer in the source comments. Code is read more often than it is written, after all.

Also - have you tested this with 3+ stacked series? Could you maybe add a test and reference image for that?

Thanks! I will work on the names. I have a test with 4 stacked series for .bar and .step, but it does not go to the extent of this test of testing edge cases in the line drawing, because I don't even know what data to use to generate such a test case with 4 series 😂.

I was going to do a PR with those tests afterwards, but if you think it should go with this one I can do that as well. This is how they look: _25_histogram_multi_stacked _26_histogram_multi_stacked_step

Note: I just noticed that the 'Use font smoothing when available' option under General settings of macOS is going to be a problem for the quartz tests: Unknown-1

odmir commented 4 years ago

That seems to happen with some newer Swift toolchains. The fix is fortunately very simple: you can just comment out that include line.

hum I just tried that but it gives more errors about unknown types like gpc_vertexor gpc_polygon

karwa commented 4 years ago

I was going to do a PR with those tests afterwards, but if you think it should go with this one I can do that as well.

Personally I think it's good. Because then when the tests pass, we all know that the code really does product output like that.

Note: I just noticed that the 'Use font smoothing when available' option under General settings of macOS is going to be a problem for the quartz tests:

Ah, I guess that means we just need to set it explicitly on the context in QuartzRenderer. That would seem like a good thing to give people control over, regardless.

For now, please just set this to true. We can add a toggle for it in another PR.

karwa commented 4 years ago

I think you can just delete the whole "agg_conv_gpc.h" file.

odmir commented 4 years ago

It is working now!

odmir commented 4 years ago

I tried replacing: heightsS1Slice, heightsSeries2, heightsS2Slice, height1S1, height1S2, height2S1, height2S2

with: backgroundHeightsSlice, foregroundHeights, foregroundHeightsSlice, backgroundLeftColumnHeight, foregroundLeftColumnHeight, backgroundRightColumnHeight, foregroundRightColumnHeight

but I think I prefer using "back/front" because it is more concise yet as clear as "background/foreground": backHeightsSlice, frontHeights, frontHeightsSlice, backLeftColumnHeight, frontLeftColumnHeight, backRightColumnHeight, frontRightColumnHeight

what do you think?

Edit: I think it’s better to use Bar instead of Column.

Edit 2: On second thought Bin might be more accurate even.

odmir commented 4 years ago

Ok now is just the change in the code for the .bar case that is missing. If what I showed above is okay then I will push the commit that makes that change.

odmir commented 4 years ago

I found another bug that will make me rewrite the code in the .bar case and maybe refactor the drawData code a little, so it doesn't really matter too much for now how that part looks. I will do it in another future PR. I think this one can be merged @KarthikRIyer if you think it is ok.

KarthikRIyer commented 4 years ago

@odmir thanks for taking this on! We can surely deal with the .bar case in another PR.

Merging