Optimal-Learning-Lab / LKT

12 stars 2 forks source link

Examples.Rmd column misspecification #18

Closed wbreilly closed 6 months ago

wbreilly commented 11 months ago

In Examples.Rmd, At 51 and 1732 reference to column "CF..ansbin" (no trailing .). Possible typo.

imrryr commented 11 months ago

Yes, this is a typo, I will fix it for clarities sake, but very strangely, it also seems that a bug in R lets it run, despite that it should really fail. I narrowed it down by finding that this out, which is impossible since the CF..ansbin is not a column in the val data table.

How did you even notice this?

My test showed that R interprets them as equivalent columns even though one doesn't exsit and should be NULL

(val$CF..ansbin==0) ==(val$CF..ansbin.==0) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [27] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [53] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [79] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [105] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [131] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ...

wbreilly commented 11 months ago

Wow, that's unexpected. I get the same result for val$CF..ansbin == val$CF..ansbin. even though only the second column exists.

I noticed because I was having issues running computeSpacingPredictors and had ruled out everything I could think of. I had copy pasted "CF..ansbin" (no trailing .) into a python data prep script.

imrryr commented 11 months ago

You may find this helpful, I have prepared it for the next edition.

Column Requirements for computeSpacingPredictors and Dependencies

computeSpacingPredictors is a function in R designed to calculate various spacing metrics based on the input data and a list of Knowledge Components (KCs). Below is a detailed breakdown of the data columns required for this function and its dependencies.

Main Function: computeSpacingPredictors

Dependencies

Summary

To effectively use computeSpacingPredictors and its dependencies, the input dataset should minimally contain Anon.Student.Id and CF..ansbin.. Additionally, CF..reltime. and CF..Time. are beneficial but can be generated. The Duration..sec. column is specifically required by the practiceTime function.

Note on ${i}

The placeholder ${i} denotes that multiple columns could be involved, depending on the KCs specified. For example, ${i}spacing could translate to columns like Math.spacing, Science.spacing, etc., based on what is passed in the KCs parameter.

imrryr commented 11 months ago

These details may still need more elaboration, but I am adding this verified discussion with ChatGPT to my Basic Operations vignette in response to your issue. If you see anything else to explain, please don't hesitate to ask. I want to support this package, but I need help to identify the support needed at times.

wbreilly commented 11 months ago

This documentation looks like a very helpful addition. Thanks! It wasn't immediately clear to me that the subject and response columns must be named precisely Anon.Student.Id and CF..ansbin... A similar document would be great for the main LKT function.

imrryr commented 11 months ago

Minimal data requirements are posted in the CRAN vignette here https://cran.r-project.org/web/packages/LKT/vignettes/Basic_Operations.html

Unfortunately, they don't include the info you were asking about, but it does illustrate these other required columns. but I'll add some more description at some point.