Closed juliohm closed 1 year ago
@vickydeka would like to give this issue a try? It should be as easy as adding a call to categorical
in the right places. It is a good opportunity to keep practicing contributions to open source on GitHub :)
Hey, can I work on this issue if no one has started work on this? I want to get started with contributing to Julia, if you could assign me this, I can get started on getting familiar with the related aspects of this issue and submit a PR.
I'm not sure this is a good idea. Ordinarily one one-hot-encodes so that models designed for "continuous" (abstract float) data can be applied when the data is categorical. To encode as categorical is a step backwards, no?
I think the OG issue is recommending to convert the Bool type of the columns generated after OneHot process to CategoricalArray type
although I am not aware of the reason behind doing this.
I'm not 100% sure of what would be the most appropriate behavior. I know that one-hot encoding is often used as "continuous" 0s and 1s but I also have seen cases where it is treated as a binary variable. Maybe a keyword option would solve the problem.
Em sex., 16 de dez. de 2022 23:28, rajatrc1705 @.***> escreveu:
I think the OG issue is recommending to convert the Bool type of the columns generated after OneHot process to CategoricalArray type
[image: image] https://user-images.githubusercontent.com/52173002/208219270-a213f657-2f1c-4e77-a5f6-fc4644ecba5f.png Current Behavior:
[image: image] https://user-images.githubusercontent.com/52173002/208219341-82c63605-27b9-480e-ba35-c0f48cf8295b.png New Behavior:
[image: image] https://user-images.githubusercontent.com/52173002/208219356-fc8d803e-bf73-4455-9148-d873eb32073f.png
although I am not aware of the reason behind doing this.
โ Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/issues/158#issuecomment-1355969553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3IWJGPZNXCP4IOEKI3WNUQNXANCNFSM6AAAAAASVMRCDI . You are receiving this because you authored the thread.Message ID: @.***>
Like this:
OneHot(:x, categorical=true)
By default, the keyword categorical
can be equal to false
.
Exactly @eliascarv. I would set the default to true though as the continuous interpretation is more low-level.
@rajatrc1705 can you work on adding this option? That would be a nice contribution.
I will start working on it! Could anyone confirm whether the images I posted for the "New Behavior" is what we want as the output?
Confirmed ๐๐ป
Em sรกb., 17 de dez. de 2022 14:50, rajatrc1705 @.***> escreveu:
I will start working on it! Could anyone confirm whether the images I posted for the "New Behavior" is what we want as the output?
โ Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/issues/158#issuecomment-1356353442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3LGNPNBMU2UTVXV3A3WNX4OVANCNFSM6AAAAAASVMRCDI . You are receiving this because you authored the thread.Message ID: @.***>
Sorry for chiming in late here, but I would say that generally it's preferable for the type (or scitype) of the output of a transformer be as insensitive to hyperparameters as possible. That way, transformers can be chained and remain valid when just one component hyperparameter is changed. So, in this case, a preferred solution would have been to add a transformer that changes Bool columns to Categorical columns for chaining with the encoder as previously implemented.
The issue here is a bit different though. The design of OneHot had the wrong scitype by default and we fixed that. In order to preserve the continuous 0s and 1s use case we added a kwarg option.
Em seg., 9 de jan. de 2023 01:21, Anthony Blaom, PhD < @.***> escreveu:
Sorry for chiming in late here, but I would say that generally it's preferable for the type (or scitype) of the output of a transformer be as insensitive to hyperparameters as possible. That way, transformers can be chained and remain valid when just one component hyperparameter is changed. So, in this case, a preferred solution would have been to add a transformer that changes Bool columns to Categorical columns for chaining with the encoder as previously implemented.
โ Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/issues/158#issuecomment-1375031047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3KOMH2CEROGHDWMX6LWRNY4JANCNFSM6AAAAAASVMRCDI . You are receiving this because you modified the open/close state.Message ID: @.***>
Currently the resulting columns are just Vector{Bool}, maybe they should be categorical arrays.