Lisp-Stat / documentation

Documentation for Lisp-Stat
Microsoft Public License
0 stars 4 forks source link

Working with data,accessing columns by name #8

Closed JARS3N closed 3 years ago

JARS3N commented 3 years ago

CL-USER> (in-package :ls-user)

<PACKAGE "LS-USER">

LS-USER> (asdf:load-system :lisp-stat) T LS-USER> (defdf mtcars (read-csv rdata:mtcars)) COMMON-LISP:WARNING: Missing column name was filled in MTCARS LS-USER> (defparameter mtcars-small (select mtcars (range 0 5) t)) MTCARS-SMALL LS-USER> (columns mtcars-small 'mpg) Key MPG not found, valid keys are #(X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB). [Condition of type KEY-NOT-FOUND] LS-USER> mtcars$mpg The variable MTCARS$MPG is unbound. [Condition of type UNBOUND-VARIABLE] ;; yet i can still access the keys LS-USER> (keys mtcars)

(MTCARS:X1 MTCARS:MPG MTCARS:CYL MTCARS:DISP MTCARS:HP MTCARS:DRAT MTCARS:WT MTCARS:QSEC MTCARS:VS MTCARS:AM MTCARS:GEAR MTCARS:CARB)

LS-USER>

Symbolics commented 3 years ago

I've got a documentation commit ready to go that will fix this and thought I'd explain here to help you (and others) understand the inner workings.

LS-USER> (defdf mtcars (read-csv rdata:mtcars))

The defdf macro setup up a package with the name of the data-frame, and exports a symbol naming each variable. This is what allows you to write mtcars:mpg to refer to the mpg variable. When we take a selection from that data-frame, like this:

LS-USER> (defparameter mtcars-small (select mtcars (range 0 5) t))

select takes a slice of a data frame using Cartesian planes.

Referring to the variable (columns mtcars-small 'mpg) is where the error lies. You see that in the keys command:

LS-USER> (keys mtcars-small)
#(MTCARS:X4 MTCARS:MPG MTCARS:CYL MTCARS:DISP MTCARS:HP MTCARS:DRAT MTCARS:WT MTCARS:QSEC MTCARS:VS MTCARS:AM MTCARS:GEAR MTCARS:CARB)

So instead of referring to 'mpg, the documentation should read (columns mtcars-small 'mtcars:mpg) Notice here that the package is mtcars, not mtcars-small; that's because select just sliced up mtcars and returned the bits we asked for.

Finally, mtcars$mpg is a mistake. There was a version of defdf that mimicked R's variable access pattern, but in the end going with a more natural Lisp syntax was judged to be better. That '$' character was missed when converting the documentation.

Thank you for the report, and I hope the above helps clear up how things work 'under the bonnet'

Symbolics commented 3 years ago

FYI. I've fixed a few more of these errors, and appreciate you pointing them out. You might also want to read the section on Use a data frame, which will eliminate that entire class of problem (though I still appreciate you pointing any more out so I can correct the documentation).