eHarmony / aloha

A scala-based feature generation and modeling framework
http://eharmony.github.io/aloha
MIT License
60 stars 12 forks source link

Add some additional docs #174

Open deaktator opened 7 years ago

deaktator commented 7 years ago

Please add any current documentation suggestions to this ticket. They will be aggregated and the docs will be updated.

deaktator commented 7 years ago

A suggestion was for installation.

  1. install java
  2. export JAVA_HOME=$(/usr/libexec/java_home) # on OS X
deaktator commented 7 years ago

Change the front page to SBT instead of Maven (or add both).

mohammed-karim-zefr commented 7 years ago

Getting Started with Data Science Section recommendations:

  1. Java is rtequired and JAVA_HOME needs to set so that required libraries, specially jni jar file is found Set the Java_Home path: export JAVA_HOME=$(/usr/libexec/java_home)
  2. Before starting the VW data/model generation section, some introduction on VW and H2O
  3. Some explanation about Creating/Verifying VW model
  4. Some information about relationship between VW model and Aloha model
  5. Links to Model Format doc (http://eharmony.github.io/aloha/docs/model_formats.html) in the model generation section if reader wants to know details about the models.
  6. Links to http://eharmony.github.io/aloha/docs/dataset.html to get better understanding about aloha-cli, input/output data structure (to be continued ..)
mohammed-karim-zefr commented 7 years ago

For Model Format Page, Decision Tree Model Section, Linear Mode Selector, need more clarification on selector values and expected behavior, specially presence/absence of second predicate and expected behavior in relation with returnBest and missingDataOk value. May be we can create a table like: returnBest missingDataOk firstSelectorIsNull, secondSelectorValue ExpdectedBehavior T T Y N .... ...

deaktator commented 7 years ago

@mohammed-karim-zefr, here's a description on the interplay of missingDataOk and returnBest. I'll update the docs to do this when I get to it.

missingDataOk (down) / returnBest (across)

   T                                    F
T  Iterate all predicates possible.     Iterate all predicates possible.   
   Skip ones where missing data is      Skip ones where missing data is
   encountered.  If no predicate        encountered.  If no predicate
   succeeds, return the current node.   succeeds, return an error.

F  On encountering missing data in a    On encountering missing data in a
   predicate, return an error.  If no   predicate, return an error.  If no
   missing data but no predicates       missing data but no predicates 
   apply, return the current node.      apply, return an error.