Closed lf-araujo closed 4 years ago
You're welcome!
edit: sorry for the kinda weird line breaks. I wrote the answer in Org mode.
There aren't any convenience functions that operate on full data frames in such a way at the moment.
But there are 2 ways you can do this.
PersistentVector[Value]
)You perform the calculation (e.g. mean
) on a DF column, like so:
let mean = df["X"].mean
^--- get column "X", type `PersistentVector[Value]`
^--- apply desired function of type
`PersistentVector[Value] -> Value`.
The result is simply a Value
(use e.g. toFloat
to get a normal
float from it).
The possible functions are those lifted here: https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/formula.nim#L1095
You can use the lift[Vector|Scalar][String|Int|Float]Proc
templates
the lift any other proc you like. The Vector
templates are for
normal procs with signature proc [T](x: seq[T]): T
(thus map a
sequence to a scalar) and the Scalar
templates simply proc [T](x: T): T
(i.e. a transformation of the value x
).
Note: in some cases you might want to only lift a proc locally. By
default the templates produce exported procs, which are only allowed
at top level. To lift a proc locally use the toExport = false
argument (it's a static bool!).
summarize
to reduce the data framethis is more in line with your question actually. You can apply a
function such as the one above using the summarize
proc:
echo df.summarize(f{"X_mean" ~ mean("X")})
where summarize
takes a single (or several) functions. The result
will be a data frame, which is reduced to a single row, due to the
application of a Vector
like proc in the aforementioned sense.
Since the result here is a full data frame, in order to get the actual
mean value, you can do:
let dfMean = df.summarize(f{"X_mean" ~ mean("X")})
echo dfMean["X_mean"][0] # since there's only 1 entry anyways
See the (sorry for the bad documentation) documentation here: https://vindaar.github.io/ggplotnim/formula.html#summarize%2CDataFrame%2Cvarargs%5BFormulaNode%5D
summarize
is useful if you want to combine this with some other
operation. Especially group_by
is special in that regard. If a
grouped data frame is handed to summarize
the operation will be done
for each group! So if you had a DF with a classification column
"class" with elements {"A", "B", "C", "D"}:
echo df.group_by("class").summarize(f{"X_mean" ~ mean("X")})
the result would be the means of the 4 classes.
Let me know if this answers your question!
This completely solve my issue. Thanks.
I think what you are doing is a great service to Nim already.
Also make sure to set up a support github link and a Brave BAT account.
Here is a follow up question.
For:
echo df["X"].variance
I get:
Variance (from the stats module) expects an openArray, which I can't lift, it seems.
Also,
Ah, you're right. That's an omission on my part. Indeed, I haven't lifted any of the procs from the stats
module.
edit2: Oh, wow. I completely missed:
which I can't lift,
I'll fix the lifting templates later today to work on openArray
!
what lifting means?
Lifting in this context just means to take a proc with signature proc [T](s: seq[T]): T
and turn it into one with proc (v: PersistentVector[Value]): Value
. Or for scalar procs just to make it work on Value
types (if the proc is generic proc [T](s: T): T
it shouldn't even be required. But since many procs aren't the Scalar
templates are there too).
To use it you have to lift it by putting the following at top level in your code:
liftVectorFloatProc(variance)
edit: I'll add those sometime later today to the default lifted procs.
Could you please share a bitcoin address so I can donate? Also make sure to set up a support github link and a Brave BAT account.
That's very kind. I'll think about it!
Ok, finally on a computer to check this.
Lifting a proc that takes openArray
works as expected. Maybe I misunderstood you and you meant it should be lifted automatically?
In any case, I'll lift those by default now.
edit: ok, just pushed that change. The stats
procs are now lifted by default. Once the CI passes, I'll push a new version.
edit2: new version with the changes is now tagged. The commit adding the lifted procs was: https://github.com/Vindaar/ggplotnim/commit/862c77ae693991970c2abc53ba1837c1f3d6b22c
Thank you. Yes something odd happened on my side, I believe I was getting an error from the interactive Nim shell I was using yesterday.
Hi,
Again, thanks for this amazing tool.
Is it possible to calculate the mean of a variable? For instance, in my toy data frame I have a variable named X. The below errs:
with
Is it possible to get the mean of one of the variables within the df?
Thank you