480Oswego2013 / CSC-HCI-480-2013-repo

8 stars 8 forks source link

RF Parameters Popup Advanced Options #14

Closed kevinwinahradsky closed 11 years ago

kevinwinahradsky commented 11 years ago

Advanced Options should include most everything available in H2O. We should start with the ability to select variable weights.

kevinwinahradsky commented 11 years ago

http://gwt.googleusercontent.com/samples/Showcase/Showcase.html#!CwDisclosurePanel

This would probably be good for the advanced options.

mkhayes commented 11 years ago

Here is a problem with selecting variable weights: I don't believe there is an easy way with the H2O API to get all the possible values for a given classification variable.

mkhayes commented 11 years ago

I think we will have to create a new H2O API endpoint so we can utilize H2O internals. Otherwise we will have to parse datasets to find the set of values for a classification variable. Unless I am missing something.

mkhayes commented 11 years ago

Upon further investigation, I think that using the "Inspect" API call we can find the min and max values and just use an increment of 1. For columns of enum types we will have to find the corresponding string value somehow.

kevinwinahradsky commented 11 years ago

I'm not seeing how to get the enum values as well. Might need more hacks in the h2o api for this.

mkhayes commented 11 years ago

Having discovered that we can get the column type for a classification variable, I think it would be a good idea to completely remove the option for users to select a classification variable that is not a valid type. Only enum and int types within a certain range are valid types. What do you guys think?

kevinwinahradsky commented 11 years ago

I agree.

mkhayes commented 11 years ago

I have run into a problem with this. I think it may be a bug with H2O. The Inspect.json API call appears to be returning incorrect values for the "type" attribute. As an example, parse the cars.csv data set. Then do an Inspect API call with offset of -1. You can see that the "cylinders" column has type "float" when I believe it should be "int." Also the "economy" column has type "int" when I believe it should be "float."

mkhayes commented 11 years ago

This appears to be fixed in the latest H2O version from github but broken in the H2O jar with our project (Inspect returning the correct column type).

kevinwinahradsky commented 11 years ago

This weekend I will be working on updating our H2O version.

mkhayes commented 11 years ago

I have pushed the code for class weights to the "rfparams" branch. A problem right now is that the columns with type "enum" will not work. For example, if you use the cars data set and use the "cylinder" column as classification variable it work well. However, if you choose "name" as classification variable, H2O will return an error.

mkhayes commented 11 years ago

I uploaded the new code for RF class weights parameter input. Need someone to verify it works so it can be merged into main. The code is on the "rfparams" branch. It can be verified that the class weights show enum values if you use, for instance, the iris data set. Be sure that you run our latest custom H2O fork.

I would like to close this issue (after merge) and put further "advanced options" parameter features in new issues.

kevinwinahradsky commented 11 years ago

Looks good to me. Make sure to sync up with the latest trunk changes and don't forget to add the updated h2o.jar file to the repo