aws-samples / amazon-sagemaker-immersion-day

MIT No Attribution
271 stars 222 forks source link

Non-numeric value found in header and could not convert string to float #82

Open mayurbhagia opened 8 months ago

mayurbhagia commented 8 months ago

Steps to replicate -

Follow steps mentioned in https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US. First we will follow pre-requisites to setup a notebook via SageMaker Studio Classic and then "git clone https://github.com/aws-samples/amazon-sagemaker-immersion-day.git"

Now next we follow Option 2 - Feature Engineering and Data Preparation using "Numpy and Pandas". For this we will use notebook "xgboost_direct_marketing_sagemaker.ipynb" In this notebook at #cell 08 print(data.corr()) gives error as: ValueError: could not convert string to float: 'housemaid' Now we can move on without printing this data by commenting this print statement

Then we move to Lab 2. Train, Tune and Deploy XGBoost. Again we use the same notebook "xgboost_direct_marketing_sagemaker.ipynb". But now in # cell 17 we get another error as: "Failed - Training job failed". Detailed error is: "UnexpectedStatusException: Error for Training job xgboost-2024-01-14-10-06-27-472: Failed. Reason: ClientError: Non-numeric value 'F' found in the header line 'False,54,3,999,0,1,0,False,False,False,False,False...' of file 'train.csv'. CSV format require no header line in it. If header line is already removed, XGBoost does not accept non-numeric value in the data., exit code: 1"