Azure / Azure-TDSP-Utilities

Utilities and scripts developed as part of Microsoft's Team Data Science Process for productive data science
Creative Commons Attribution 4.0 International
373 stars 275 forks source link

[IDEAR] Error in Rank Variables #18

Closed lucazav closed 7 years ago

lucazav commented 7 years ago

Hi all,

I'm trying to get info from the train.csv file you can find here:

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

I'm using this yaml file:

DataFilePath:
    'Z:\<your_path>\train.csv'
HasHeader:
    Yes
Separator:
    ','
CategoricalColumns:
    - MSSubClass
    - MSZoning
    - Street
    - Alley
    - LotShape
    - LandContour
    - Utilities
    - LotConfig
    - LandSlope
    - Neighborhood
    - Condition1
    - Condition2
    - BldgType
    - HouseStyle
    - OverallQual
    - OverallCond
    - RoofStyle
    - RoofMatl
    - Exterior1st
    - Exterior2nd
    - MasVnrType
    - ExterQual
    - ExterCond
    - Foundation
    - BsmtQual
    - BsmtCond
    - BsmtExposure
    - BsmtFinType1
    - BsmtFinType2
    - Heating
    - HeatingQC
    - CentralAir
    - Electrical
    - KitchenQual
    - Functional
    - FireplaceQu
    - GarageType
    - GarageFinish
    - GarageQual
    - GarageCond
    - PavedDrive
    - PoolQC
    - Fence
    - MiscFeature
    - SaleType
    - SaleCondition
NumericalColumns:
    - LotFrontage
    - LotArea
    - YearBuilt
    - YearRemodAdd
    - MasVnrArea
    - BsmtFinSF1
    - BsmtFinSF2
    - BsmtUnfSF
    - TotalBsmtSF
    - YearRemodAdd
    - 1stFlrSF
    - LowQualFinSF
    - GrLivArea
    - BsmtFullBath
    - BsmtHalfBath
    - FullBath
    - HalfBath
    - Bedroom
    - Kitchen
    - TotRmsAbvGrd
    - Fireplaces
    - GarageYrBlt
    - GarageCars
    - WoodDeckSF
    - OpenPorchSF
    - EnclosedPorch
    - 3SsnPorch
    - ScreenPorch
    - PoolArea
    - MiscVal
    - MoSold
    - YrSold
    - SalePrice
ColumnsToExclude:
    - Id
Target:
    SalePrice
RLogFilePath:
    'Z:\<your_path>\house_prices.log.r'

When I try to rank the variables versus the SalePrice variable, I get the following error:

task 4 failed - "models were not all fitted to the same size of dataset"

What's wrong?

Thank you

xibingaomsft commented 7 years ago

lucazav, Thanks for reporting the issue. It was due to an anomaly in computing the metrics. We just fixed the error handling in EtaSq computation. Now you should be able to run.

lucazav commented 7 years ago

Hi @xibingaomsft , you're welcome! Now it works like a charm. Thank you for your fast support!