Open manuelarnold opened 3 years ago
This is work-in-progress now.
Please see versions from https://github.com/brandmaier/semtree/commit/6e01466884c6dc5b0a5be16caaa6d645e3d09a02 and above. We now have pseudo-constants that can be returned to define scale of measurement. Please return the respective types from the score tests back to growTree()
. The constants are defined in semtree-package.R
as:
.SCALE_METRIC = 2
.SCALE_ORDINAL = 3
.SCALE_CATEGORICAL = 1
semtree now properly handles unordered and ordered factors but these changes broke score-tests for ordinal variables. I identified one possible problem in your code (https://github.com/brandmaier/semtree/commit/2d813e86085560be3ab4427c6f3c711d552e9274) but the score test still fails. Let me know what you need to know to fix this, @manuelarnold .
I tried to fix the issue in https://github.com/brandmaier/semtree/commit/d7b1247f02ba63f79bbfddca9f05bc8abcfda62b. I hope this is all that is needed. Please confirm.
@manuelarnold, could you please confirm that this is OK and then close the issue?
There are some new changes related to this topic that we could discuss here: In my fork, I also distinguish between dummy (categorical variables with two levels) and multinomial variables (categorical variables with more than 2 levels). So, I would be in favor of separating nSCALE_CATEGORICAL into .SCALE_MULTINOMIAL and .SCALE_DUMMY. By the way, score-based testing of multinomial variables is now fully score-based and should be faster than the testing in the main branch.
@manuelarnold, how should we proceed with these changes? Would you want to prepare a pull request, so that I can check your proposed changes?
I think these changes are already in the main branch. I will try to solve some conflicts in the next weeks and then we can start the process of synching the branches.
Currently, the cur.type is 1 for categorical variables and 2 for metric and ordinal variables. Since the distinction between ordinal and metric variables is important for both maxLR test statistics and score-based tests, it would make sense to use different cur.type values for both types of variables. 1: categorical 2: ordinal 3: metric