Closed kevinwinahradsky closed 11 years ago
http://gwt.googleusercontent.com/samples/Showcase/Showcase.html#!CwDisclosurePanel
This would probably be good for the advanced options.
Here is a problem with selecting variable weights: I don't believe there is an easy way with the H2O API to get all the possible values for a given classification variable.
I think we will have to create a new H2O API endpoint so we can utilize H2O internals. Otherwise we will have to parse datasets to find the set of values for a classification variable. Unless I am missing something.
Upon further investigation, I think that using the "Inspect" API call we can find the min and max values and just use an increment of 1. For columns of enum types we will have to find the corresponding string value somehow.
I'm not seeing how to get the enum values as well. Might need more hacks in the h2o api for this.
Having discovered that we can get the column type for a classification variable, I think it would be a good idea to completely remove the option for users to select a classification variable that is not a valid type. Only enum and int types within a certain range are valid types. What do you guys think?
I agree.
I have run into a problem with this. I think it may be a bug with H2O. The Inspect.json API call appears to be returning incorrect values for the "type" attribute. As an example, parse the cars.csv data set. Then do an Inspect API call with offset of -1. You can see that the "cylinders" column has type "float" when I believe it should be "int." Also the "economy" column has type "int" when I believe it should be "float."
This appears to be fixed in the latest H2O version from github but broken in the H2O jar with our project (Inspect returning the correct column type).
This weekend I will be working on updating our H2O version.
I have pushed the code for class weights to the "rfparams" branch. A problem right now is that the columns with type "enum" will not work. For example, if you use the cars data set and use the "cylinder" column as classification variable it work well. However, if you choose "name" as classification variable, H2O will return an error.
I uploaded the new code for RF class weights parameter input. Need someone to verify it works so it can be merged into main. The code is on the "rfparams" branch. It can be verified that the class weights show enum values if you use, for instance, the iris data set. Be sure that you run our latest custom H2O fork.
I would like to close this issue (after merge) and put further "advanced options" parameter features in new issues.
Looks good to me. Make sure to sync up with the latest trunk changes and don't forget to add the updated h2o.jar file to the repo
Advanced Options should include most everything available in H2O. We should start with the ability to select variable weights.