corels / rcppcorels

R Bindings to the Certifiably Optimal Rule Lists (Corels) Learner
45 stars 3 forks source link

Preparing the data #7

Open bgreenwell opened 4 years ago

bgreenwell commented 4 years ago

Hi @eddelbuettel, first off, thanks for porting CORELS to R. I'm writing a book about trees and came across this while writing about rule-based models and it seems really promising. That being said, would you be open to a few PRs? Starting with this one on formatting the data for the users. I have some starter code I was using for a couple of examples that wouldn't be difficult to generalize, but wanted to check if you had a specific design in mind first? E.g., a formula method

corels(Edibility ~ ., data = mushroom, ...)   OR   corels(X, y, ...)

where X and y are both constructed and written out to a temporary (or user-specified) location, or maybe even just a simple prepareData() function? Soon after, I can submit at least one or two good examples to include that seems perfect for this type of algorithm. Happy to hear your thoughts. Obviously numeric inputs would have to be re-encoded by the user before hand.

eddelbuettel commented 4 years ago

Hi and thanks for raising an issue. Yes this is on our TODO list. Preferably with proper data.frame conversion etc but we have not gotten there yet. This is really a group effort with the corels org even though this was so far just me committing. So @nlarusstone and @fingoldin may pipe in as well.

Not sure if you have seen tidycorels by @billster45 which already adds a more R-alike interface (though by going through, IIRC, external files more akin to the corels binary).

PS And see #6 for a little bit of prior discussion on tidycorels.

bgreenwell commented 4 years ago

Thanks for pointing me towards @billster45's tidycorels package, which looks like it uses a similar approach to what I was simply doing with just writeLines(), but I'd assume there's a more organic way to handle this other than coercing data frames and writing files out to a (possibly temporary) directory for consumption by corels. I'll keep an eye on the repo and post anything that might be useful.

For reference, here's the example I'm playing with: https://gist.github.com/bgreenwell/482cffb8b7a5c60103fe5526b236e0ad

billster45 commented 4 years ago

@bgreenwell yes, writing out the dataframe to text files and all the other parts of tidycorels (e.g. capturing the console output and converting to data.table if-else code) are hacky things I was doing to compare corels to popular ML methods. Then occured I could try package building with them.

Good to try your corels code on the mushroom data and see everything in compact base R. With the polite nudge from @eddelbuettel I was amazed how much cleaner R code can be using only base and one package (data.table). The points made here make more sense to me now.