IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Adding dialogues for machine learning - caret #6671

Open rdstern opened 3 years ago

rdstern commented 3 years ago

caret appears to be a "one-stop-shop" for machine learning. It is the only package that has a chapter devoted to it, in the Introduction to Data Science book.

I suggest a small set of dialogues - plus (possibly) keyboards in our Fit Model Keyboard and Use Model Keyboard.

I suggest the first dialogue be called Machine Learning (caret)and be under the Generalin the Model > Fit Modelmenu.

It will be for the traincommand in the caret package, plus various options that will imply a trainControl command is also given. For example the default is to do 25 bootstrap samples. Sometimes you may want many more - e.g. 250 and that would be done as an option in trainControl.

The initial dialogue can be quite simple, but it will be useful to try the commands to check on the options needed. I suggest as follows, to start with. (Base it on the Fit Model > General a) Usual data selector b) Single receiver for the Response Variable - as in the Fit > General dialogue c) Also Have a Label Type as in the Fit Model General, but next to the single control. Ideally it will also give the number of levels, but that is also the case for the Fit Model General. So now just follow what is there. d) Then have Explanatory O Variables O Model - so with 2 radio buttons. Default is Variables and that gives a Multiple receiver. If Model is chosen then give the same receiver as in Fit Model > General, but without the keyboard below. e) Where there is Distribution in the General dialogue, have Model instead and then include the Fit_Model_List. Is that easy to do? Perhaps the default is glm. f) Then there is the Save Model control (as in General. g) There will be more options added - so leave space for them. There may be further suggestions before work on the dialogue starts?

N-thony commented 3 years ago

@rdstern what about this issue? I can start the first part.

N-thony commented 3 years ago

@rdstern what about this issue? I can start the first part.

rdstern commented 3 years ago

It would be good to add a keyboard in the Model > Fit Model keyboard dialogue as well.
Possibly there could be an individual key for the most used methods, plus a pull down for all of them. This article is possibly useful.

I also saw the top 5 as: linear regression logistic regression Decision Trees Naive Bayes kNN

We must have glm - of which logistic regression is a special case.

HawardKetoyoMsatsi commented 3 years ago

@rdstern I have added the machine learning dialog, I will go ahead and add the keyboard to the dialog as well.

shadrackkibet commented 3 years ago

@HawardKetoyoMsatsi what's the state of this dialog? This task is a high priority.

HawardKetoyoMsatsi commented 3 years ago

@shadrackkibet the dialog is partly done I was fitting the additional keyboard then got caught up in the team dashboard tasks.

shadrackkibet commented 3 years ago

Great. Can we get what you've done already into a PR?

HawardKetoyoMsatsi commented 3 years ago

@shadrackkibet sure, I can open one.

shadrackkibet commented 3 years ago

Quick check, any progress on this task?

lilyclements commented 2 years ago

The PR in #6875 works on this. I suggest someone else takes it over.

Changes I can see to be made from looking at the dialog, and from comments on the PR are:

  1. Explanatory Model label above the multiple receiver should read "Explanatory Variables"
  2. Train Size nud should go in steps 0.01 not 0.05 size goes in steps of 0.05, with a maximum of 0.99 (or 0.999) and a minimum of 0.01.
  3. We want the "Model" rdo to be enabled. From @rdstern comments above: "If Model is chosen then give the same receiver as in Fit Model > General, but without the keyboard below."
  4. From @rdstern comments above: "Where there is Distribution in the General dialogue, have Model instead and then include the Fit_Model_List. Is that easy to do? Perhaps the default is glm."

@rdstern do correct me if I am wrong on these suggested changes, or if any of these suggestions made are no longer changes we want

lilyclements commented 2 years ago

@rdstern can you explain this point?: "Where there is Distribution in the General dialogue, have Model instead and then include the Fit_Model_List. Is that easy to do? Perhaps the default is glm." What is the "Fit_Model_List"?

N-thony commented 2 years ago

@rdstern can you explain this point?: "Where there is Distribution in the General dialogue, have Model instead and then include the Fit_Model_List. Is that easy to do? Perhaps the default is glm." What is the "Fit_Model_List"?

@rdstern what do you think?

N-thony commented 2 years ago

@anastasia-mbithe any progress on this?