Vindaar / ggplotnim

A port of ggplot2 for Nim
https://vindaar.github.io/ggplotnim
MIT License
175 stars 15 forks source link

Allow non `Option[T]` setting, use DF columns as setting, add alpha scale #143

Closed Vindaar closed 2 years ago

Vindaar commented 2 years ago

This contains a large number of changes. The biggest one being related to the 'setting' arguments of all geom_* procedures.

Note: currently (due to a small breaking change in datamancer), this PR depends on: https://github.com/SciNim/Datamancer/pull/24

Other changes

Mainly see the changelog:

* v0.5.0
- add support for custom (and customized) colormaps
- add additional =inferno=, =magma=, =plasma= colormap
- add =scale_fill/color_gradient= function to assign such color maps
  to a plot
- allow to customize layout in =ggmulti= plot (subplots of fully
  separate plots):
  - allow so set custom widths and heights for the rows / columns
  - allow to prefer columns over rows and vice versa for the layouting
    (by adding a =prefer_columns()= or =prefer_rows()= call to the
    plotting chain)
- =ggplot= call now allows widths and heights not only as =float=
  value, but any number
- add =alpha= as a valid =Scale= (allows setting & mapping alpha)
- add the following scale functions:
  - =scale_size/alpha_discrete/continuous=: force given size / alpha
    scales to be discrete / continuous
  - =scale_color/fill/size/alpha_identity=: force the given
    corresponding =aes= scale to be treated as containing values to
    *set* the scale, i.e.:
    #+begin_src nim
import ggplotnim

let df = toDf({ "x" : @[1, 2, 3, 4], "y" : @[1, 2, 3, 4],
                "colors" : @["red", "green", "blue", "#FF00FF"]})
ggplot(df, aes("x", "y")) +
  geom_point(aes = aes(color = "colors"), size = 12.3) +
  scale_color_identity() +
  ggsave("/tmp/colors_manual.pdf")
    #+end_src
    i.e.: =aes= the =color= scale, then say that given =color= is
    actually a scale to *set* values based on the column directly,
    instead of performing automatic *mapping* based on the number of
    distinct labels.
- major change in how =geom_*= procedures deal with setting scales:
  see the description of PR #<insert number> for what this
  implies. Short version: one can now hand arguments for e.g. =size=,
  =alpha=, =color=, ... as *non* =Option[T]= values (e.g. see the
  =size= argument in the code snippet above). Also explicit
  =string/int= values are now supported for colors.
- add =-d:nolapack= compilation option to remove LAPACK
  dependency. This disables support for =geom_smooth=

On geom_* setting arguments

From commit https://github.com/Vindaar/ggplotnim/commit/1f074cae87139e2657a4c531ed4561ac40de2c9d

Previously, any argument that sets a scale of a geom (e.g. specifying the size of points, changing the color of lines etc.) had to be given as an Option[T]. This has always been a bit annoying and useless. Why should the user have to be forced to write some(1.5)? This has always been a limitation of our logic, to make sure we can detect what is a default value vs. what is actually given by the user. Essentially, not for all types a good default value that may never be used by the user to set was unambiguously possible.

This commit changes the whole logic to a multi generic approach:

We add a Missing type and each argument to all the geom_* procedures become types of the form:

PossibleFloat = Missing | SomeNumber | string | Option[float]
...

What this means is the following: By default each optional argument is set to missing() (which just returns a Missing instance). This way we can deduce at CT whether an argument was given or not (if it's type Missing it means use the default).

This means we can now use any number (in case of float like arguments, e.g. size, ...) as input and know the argument was actually given by the user.

The Option[float] argument is part of the type class to make sure old code continuous to work!

You may notice the weird possible "string" type as well. This is to allow the (added in a previous commit) distinction between mapping values based on data (i.e. an argument of in aes), setting to a constant value (in case of size by giving a number) or setting, but based on a DF column. If a string is given, it's interpreted as referring to a column of the input dataframe. The values found in that column will then be used explicitly to set the scale.

For Color scales (fill and color) we also go one step further and now allow to set colors not only by providing chroma.Color types, but also by using int values (interpreted as uint32 colors) and string based colors (by treating them as HTML valid color strings).

This means code as follows is now possible:

import ggplotnim

let df = toDf({ "x" : @[1, 2, 3, 4], "y" : @[1, 2, 3, 4],
                "colors" : @["red", "green", "blue", "#FF00FF"]})
ggplot(df, aes("x", "y")) +
  geom_point(aes = aes(color = "colors"), size = 12.3) +
  scale_color_identity() +
  ggsave("/tmp/colors_manual.pdf")
ggplot(df, aes("x", "y")) +
  geom_point(color = "colors", size = 12.3, alpha = 0.6) +
  ggsave("/tmp/colors_automatic.pdf")

Note the usage of referring to a column in the color argument, either as an aes (as usual) and then telling it to treat the scale color as the identity, or by just explicitly setting a color to a string ↦ interpreted as a column, because "colors" is not a valid HTML color itself. In addition we see the usage of explicitly setting sizes & alphas based on pure numbers instead of Option[float].