CCS-Lab / easyml

A toolkit for easily building and evaluating machine learning models.
https://ccs-lab.github.io/easyml
Other
40 stars 16 forks source link

Documentation #97

Closed Nathaniel-Haines closed 7 years ago

Nathaniel-Haines commented 7 years ago

I was emailed by someone interested in using the package, but they were unsure about what is going on within the functions. I know that the documentation is still in progress, but I thought it would be helpful to see what they had to say/what their confusion was about. Hopefully this can help guide the documentation:

"Okay good to know! For questions, I guess I'm just trying to get a sense of what's going on under the hood. I'm decently familiar with glmnet and e1057 (or whatever it's called), and I'm just trying to figure out what easyml is doing for me versus what I need to do on my own. In particular, I note that it's taking way longer than normal 1-shot svm's or glmnet's usually take, so I assume it's doing all sorts of cross-validation and other stuff, but it's not super clear to me what all that is. Like, what exactly do you mean by replicating variable importance metrics, replicating predictions, and replicating metrics (it's possible I'm just not familiar with all the jargon)? And is there any parameter tuning or grid search or something like that which serves to optimize classification results? And I guess similarly, for svm, are these linear or kernel SVMs?

Also if you're too busy to answer these that's okay, I can poke around some more with the code and probably find some answers, just thought I'd ask.

Thanks!!"

paulhendricks commented 7 years ago

Very helpful, thanks Nate.

It's clear there exists a need for documentation in addition to just function references (which are useful but don't clearly illustrate examples and use cases). I've started to put together some vignettes that demonstrate the usefulness of easyml; one exists here for the titanic data set (https://ccs-lab.github.io/easyml/articles/titanic.html). I think the white paper will help clear some things up but I can definitely see how it would be helpful to have some documentation on the website.

paulstillman commented 7 years ago

Hey Paul, I'm the person who originally sent that to email Nate. Do you guys know when the white paper will be available? I'm hoping to start writing up some results in the not-too-distant future, and I want to make sure I'm understanding what's going on beneath the hood here.

Thanks for all your guys' hard work on this, this already seems much less painful than using the individual packages themselves.

youngahn commented 7 years ago

Paul, the manuscript is almost ready to be submitted. Its preprint will be submitted to bioRxiv very soon (hopefully within a week). Please email me for more information.

paulhendricks commented 7 years ago

Some resources for documentation:

Background and motivation - http://www.biorxiv.org/content/early/2017/07/02/137240 R - https://ccs-lab.github.io/easyml/ Python - http://easyml.readthedocs.io/en/latest/?badge=latest

If there's anything that you'd like to see that's not in the above, please open an issue and let us know!