Closed ludoro closed 4 years ago
Hi @ludoro,
Glad you are finding it useful! You can read more about it in our paper https://doi.org/10.1016/j.asoc.2019.106050 where you can see the effect of it in Figure 3. But in general you can think of it as a distance between the categorical dimensions when the sampling plan is optimised.
For the example in the documentation (https://mrurq.github.io/LatinHypercubeSampling.jl/stable/man/categorical/) you can think of catWeight=1000
as a large separation between the categorical dimensions which is similar to making separate LHC plans for each category. catWeight=0
can be interpreted as no separation between the categorical dimensions where the categorical dimensions for each point is selected randomly. The risk of having it set to 0 is that all points in one dimension could become clustered to one side of the design space without any penalty. In general I would suggest to use some separation like catWeight=1
to prevent this from happening.
A small note, in the paper the weight values refer to a LHC which is scaled from 0 to 1. In this package the LHC is unscaled integers starting from 1 to N where N is the number of samples. So a catWeight=1
is the same as the step distance in each dimension.
I see, thanks a lot!
Hey @MrUrq,
Super cool library. I am working on MLJTuning where we want to have a LatinHypercube hyper-parameter optimization method, so I am using your library there. One small issue I have is the use of "catWeight". There are many cases where we have categorical values, but it's not very clear how that parameter works. At the moment I just always set it to 0. I have not found any reference of it in the two papers you list as reference, would you care to share some light on it?
Thanks a lot!