Azure / Azure-TDSP-Utilities

Utilities and scripts developed as part of Microsoft's Team Data Science Process for productive data science
Creative Commons Attribution 4.0 International
373 stars 275 forks source link

BinaryClassification_UCI_Income.yaml incorrectly ordered. #16

Closed amwahl closed 6 years ago

amwahl commented 7 years ago

The example BinaryModelSelection.rmd file is throwing an error when executing on the BinaryClassification_UCI_Income.yaml. The issue occurs because the parameter grid for runXgBoost is created with switched values of xgBoostsubsample and min_child_weight. The indexing should be switched to match the order of those parameters in the yaml (or the order of the parameters in the yaml should be switched).

Pelonza commented 7 years ago

Confirmed... I ran into this too when trying to use the UCI yaml file as a template.

deguhath commented 7 years ago

Thanks for reporting this issue. What package version are you using for xgboost?

Pelonza commented 7 years ago

This error has nothing to do with the xgboost package. It has to do with the ordering you read in the parameters for the xgboost from the yaml file.

You just have them indexed differently than the actual ordering in the yaml file.

deguhath commented 7 years ago

Noted and fixed the issue in yaml file. Thanks for pointing it out.

Pelonza commented 7 years ago

Note, even with that fixed, I was having issues getting the AMR to run to completion on the UCI data... I don't know about the original user.

Specifically, when I run even the UCI yaml, I'm now getting that the data has a new factor.

Again, I think this could be fixed easily, but you'll need to either refactor your country data in the file...or delete data....

To schedule a meeting or appointment try: https://karlrbschmitt.youcanbook.me/

Dr. Karl Schmitt Assistant Professor Department of Mathematics and Statistics Department of Computing and Information Sciences Director of Data Science Program Director of Analytics and Modeling Graduate Program Valparaiso University, Indiana

On Fri, Mar 24, 2017 at 1:59 PM, Debraj GuhaThakurta < notifications@github.com> wrote:

Noted and fixed the issue in yaml file. Thanks for pointing it out.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/Azure-TDSP-Utilities/issues/16#issuecomment-289115286, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRGU8HD-Gc_-8D6_7Z7DoCHjdlbEghTks5rpBKYgaJpZM4L1Wch .

Pelonza commented 7 years ago

I actually had a lot of problems with using your "turn into factor" columns setup.

R doesn't seem to like to turn array-indexed variables into factors, it prefers extracting them as attributes of a data.frame that is:

trainDF['column']<-as.factor(trainDF['column'] ) : Generates listing error trainDF$column<-as.factor(trainDF$column) : Does not generate an error

Fixing these lines in your markdown file might fix the Honduras problem also....

To schedule a meeting or appointment try: https://karlrbschmitt.youcanbook.me/

Dr. Karl Schmitt Assistant Professor Department of Mathematics and Statistics Department of Computing and Information Sciences Director of Data Science Program Director of Analytics and Modeling Graduate Program Valparaiso University, Indiana

On Fri, Mar 24, 2017 at 3:36 PM, Karl Schmitt karl.schmitt@valpo.edu wrote:

Note, even with that fixed, I was having issues getting the AMR to run to completion on the UCI data... I don't know about the original user.

Specifically, when I run even the UCI yaml, I'm now getting that the data has a new factor.

Again, I think this could be fixed easily, but you'll need to either refactor your country data in the file...or delete data....

To schedule a meeting or appointment try: https://karlrbschmitt.youcanbook.me/

Dr. Karl Schmitt Assistant Professor Department of Mathematics and Statistics Department of Computing and Information Sciences Director of Data Science Program Director of Analytics and Modeling Graduate Program Valparaiso University, Indiana

On Fri, Mar 24, 2017 at 1:59 PM, Debraj GuhaThakurta < notifications@github.com> wrote:

Noted and fixed the issue in yaml file. Thanks for pointing it out.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/Azure-TDSP-Utilities/issues/16#issuecomment-289115286, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRGU8HD-Gc_-8D6_7Z7DoCHjdlbEghTks5rpBKYgaJpZM4L1Wch .

deguhath commented 6 years ago

If you want to submit a pull request with your code fixes, please do so. We can test and merge. Thanks.

deguhath commented 6 years ago

I have commented in the BinaryModelSelection.rmd markdown file for the users to try this convention - trainDF$column, if existing code breaks in their R version. Thanks for raising and explaining this issue.