Perhaps some Pandas behaviour changed? But currently the following line in notebook 1 Autopilot and XGBoost.ipynb:
df_model_data = pd.get_dummies(df_model_data) # Convert categorical variables to sets of indicators
...is yielding boolean typed columns for all the one-hot encoded variables. This is consistent with the current pandas doc, and the notebook seems to train the XGBoost model fine - but the XGBoost evaluation Batch Transform job fails with:
RuntimeError: Loading csv data failed with Exception, please ensure data is in csv format:
<class 'ValueError'>
could not convert string to float: 'False'
I believe we need to add , dtype=int to the get_dummies() call to ensure the generated train/val/test datasets are fully numeric to be compatible with the SageMaker XGBoost algorithm. Haven't quite finished testing it through yet though.
Perhaps some Pandas behaviour changed? But currently the following line in notebook 1 Autopilot and XGBoost.ipynb:
...is yielding boolean typed columns for all the one-hot encoded variables. This is consistent with the current pandas doc, and the notebook seems to train the XGBoost model fine - but the XGBoost evaluation Batch Transform job fails with:
I believe we need to add
, dtype=int
to theget_dummies()
call to ensure the generated train/val/test datasets are fully numeric to be compatible with the SageMaker XGBoost algorithm. Haven't quite finished testing it through yet though.