Vindaar / ggplotnim

A port of ggplot2 for Nim
https://vindaar.github.io/ggplotnim
MIT License
175 stars 15 forks source link

key error on specifying df-columns for both x-axis and y-axis for vega plot #158

Closed fbpyr closed 1 year ago

fbpyr commented 1 year ago

Using rMpgStackedBarPlot , when I specify the y-axis to be cyl (instead of y-axis being count without specification):

import ggplotnim
import ggplotnim/ggplot_vega

let df = readCsv("mpg.csv")

ggplot(df, aes(x="class", y="cyl" fill="drv")) + 
  geom_bar() + 
  scale_y_continuous() +
  ggvega("rMpgStackedBarPlot.html")

I get a key error at L216 ggplotnim/ggplot_vega.nim :

WARN: losing information in `encodeType`
WARN: losing information in `encodeType`
WARN: losing information in `encodeGeomSpecifics`!
/home/{user}/code/nim_ggplot/stacked_bar.nim(10) stacked_bar
/home/{user}/.nimble/pkgs/ggplotnim-0.5.6/ggplotnim/ggplot_vega.nim(356) +
/home/{user}/.nimble/pkgs/ggplotnim-0.5.6/ggplotnim/ggplot_vega.nim(333) ggvegaCreate
/home/{user}/.nimble/pkgs/ggplotnim-0.5.6/ggplotnim/ggplot_vega.nim(216) toVegaLite
/home/{user}/.nimble/pkgs/datamancer-0.3.8/datamancer/value.nim(64) []
/home/{user}/opt/nim_1.6.8/lib/pure/collections/tables.nim(246) []
/home/{user}/opt/nim_1.6.8/lib/pure/collections/tables.nim(234) raiseKeyError
Error: unhandled exception: key not found: cyl [KeyError]
Error: execution of an external program failed: '/home/{user}/code/nim_ggplot/stacked_bar '

so I added some echos before line 216 to understand what data was present and searched:

DataFrame with 11 columns and 234 rows:
     Idx    manufacturer           model           displ            year             cyl           trans             drv             cty             hwy              fl           class
  dtype:          string          string           float             int             int          string          string             int             int          string          string
       0            audi              a4             1.8            1999               4        auto(l5)               f              18              29               p         compact
       1            audi              a4             1.8            1999               4      manual(m5)               f              21              29               p         compact
       2            audi              a4               2            2008               4      manual(m6)               f              20              31               p         compact
       3            audi              a4               2            2008               4        auto(av)               f              21              30               p         compact

WARN: losing information in `encodeType`
WARN: losing information in `encodeType`
WARN: losing information in `encodeGeomSpecifics`!
scCol =class
locDf =DataFrame with 3 columns and 7 rows:
     Idx                          class      counts_GGPLOTNIM_INTERNAL    prevVals_GGPLOTNIM_INTERNAL
  dtype:                         string                          float                       constant
       0                        2seater                              5                              0
       1                        compact                              0                              0
       2                        midsize                              0                              0
       3                        minivan                              0                              0
       4                         pickup                              0                              0
       5                     subcompact                              9                              0
       6                            suv                             11                              0

lab ={drv: r}
subDf =DataFrame with 3 columns and 7 rows:
     Idx                          class      counts_GGPLOTNIM_INTERNAL    prevVals_GGPLOTNIM_INTERNAL
  dtype:                         string                          float                       constant
       0                        2seater                              5                              0
       1                        compact                              0                              0
       2                        midsize                              0                              0
       3                        minivan                              0                              0
       4                         pickup                              0                              0
       5                     subcompact                              9                              0
       6                            suv                             11                              0

/home/{user}/code/ggplotnim/stacked_bar.nim(10) stacked_bar
/home/{user}/code/ggplotnim/nimbledeps/pkgs/ggplotnim-0.5.5/ggplotnim/ggplot_vega.nim(365) +
/home/{user}/code/ggplotnim/nimbledeps/pkgs/ggplotnim-0.5.5/ggplotnim/ggplot_vega.nim(342) ggvegaCreate
/home/{user}/opt/nim_1.6.8/lib/core/macros.nim(533) toVegaLite
/home/{user}/code/ggplotnim/nimbledeps/pkgs/datamancer-0.3.8/datamancer/value.nim(64) []
/home/{user}/opt/nim_1.6.8/lib/pure/collections/tables.nim(246) []
/home/{user}/opt/nim_1.6.8/lib/pure/collections/tables.nim(234) raiseKeyError
Error: unhandled exception: key not found: class [KeyError]
Error: execution of an external program failed: '/home/{user}/code/ggplotnim/stacked_bar '
stack trace: (most recent call last)
/tmp/nimblecache-1907554827/nimscriptapi_3692668268.nim(187, 16)
/home/{user}/code/ggplotnim/ggplotnim.nimble(128, 8) check_vegaTask
/home/{user}/opt/nim_1.6.8/lib/system/nimscript.nim(273, 7) exec
/home/{user}/opt/nim_1.6.8/lib/system/nimscript.nim(273, 7) Error: unhandled exception: FAILED: nim c -r stacked_bar.nim [OSError]
     Error: Exception raised during nimble script execution

( I had a similar error in some of my graphs (x-axis[string] month names, y-axis[float] hours), but there it complained about missing key for the y-axis column name. So out of curiousity I renamed the y-axis key to counts_GGPLOTNIM_INTERNAL as it showed in the vega hover tools for only x-axis specified graphs, and the error went away! 😮 but it still just did the count of elements instead of the numeric values of the column. )

Would there be a way to get the cyl column summend instead of the counts? @Vindaar : or is the above what you referred to as potential for larger backend changes?

Vindaar commented 1 year ago

Well, for a start independent of the backend, the code you use in the example should error, as you cannot use both an x and y scale with geom_bar (unless you use stat = "identity", in which case the y values are simply read from that column, and no counting is done). So it's definitely an issue unrelated to this that the plot is just silently created and even uses the y label from the given y. Need to make a fix for that.

What you probably intend though, is aes(x = "class", weight = "cyl", fill = "drv"). That works correctly on the normal backend, but has the same issue using Vega. I'm looking into it, I think it should also be an easy fix.

Vindaar commented 1 year ago

Fixed in #160.

fbpyr commented 1 year ago

Thank you so much! Oh I see, I was still caught in altair syntax (where one would encode df column names to x and y).

When I use as you suggested:

import ggplotnim

let df = readCsv("mpg.csv")

ggplot(df, aes(x="class", weight="cyl", fill="drv")) + 
  geom_bar() + 
  scale_y_continuous() +
  ggsave("rMpgStackedBarPlot.html")

I get: grafik

but that seems to use count on the y-axis, instead of the weight/sum/accumulated cyl int? (For cyl I do get a warning saying cyl is determined to be discrete, but when I add the suggested scale_y_continuous I still get the same warning.) I would have expected weigth would sum up the cyl values For example for minivan: 10 * 6cyl + 1 * 4cyl -> 64 cyl_count , so top of bar minivan would be at 64 instead of 11?

Or would I need to pre-calculate it like in recipe rBarPlotCompStats ?

Vindaar commented 1 year ago

Ah, you're right of course. I didn't test it properly and just went by memory. weight is currently only supported for regular histograms, but not bar charts. So one way is to precompute it and the other to let me fix it (hopefully quickly). :)

fbpyr commented 1 year ago

Ah ok - on histograms. Well, that would be amazing! - thank you so much @Vindaar

Vindaar commented 1 year ago

Coming up in #161 (and see the note about naming of the resulting y label).

Vindaar commented 1 year ago

Should be fine on the latest tag.

fbpyr commented 1 year ago

Wohoo - works like a charm! 🎇 😀 Thank you so much!!

fbpyr commented 1 year ago

..also works via vega: 😌

import ggplotnim
import ggplotnim/ggplot_vega

let df = readCsv("mpg.csv")

ggplot(df, aes(x="class", weight="cyl", fill="cyl")) + 
  geom_bar() + 
  ggvega("rMpgStackedBarPlot.html", show=false)

grafik