newbie SageMaker with XGBoost

winanjaya-mtwi commented 5 years ago

Hello All,

I am new to AWS Sagemaker, I try to use XGBoost algorithm but it keeps fail, here are what I have done:

Create a S3 bucket
Upload the .csv
Create labeling jobs (completed)
Create a notebook instance with XGBoost minist example
Create training job
- Use Amazon SageMaker built-in Algorithm as Algorithm source
- Choose XGBoost Algorithm
  - set num_round to 3
  - set objective to reg:liniear
  - set S3 location to s3://mys3/
  - Set S3 output path s3://mys3/output

after waiting couple minutes, it failed with the following error:

ClientError: Blankspace and colon not found in firstline '1 4a4fc709a737ab971e7a1008a72a930c ...' of file 'groundtruth_tqOAx9PEb2RJ3xlAiRN5khotlbq4KUZfZKT_IN2m9d0j2W2h_GbHJxlp5UgJcomc4BqO8qnpPZiDNKcnGqcELSNMzOm6dXRpvzJUOeAFgFOvWfUrym_pI8z35vKYEG.hUdmlDAXjj6M5LVinEg0N8rdMPKBfDnHAmj_9THczi2Y-_etags_do_not_modify.tmp'. ContentType by defaullt is in libsvm. Please ensure the file is in libsvm format.

I need a guidance

mytry.zip

JavierLopezT commented 5 years ago

Hello,

CSVs passed to XGBoost need to be in a specific format:

No header row Outcome variable in the first column, features in the rest of the columns (there's no ability to drop them during the training process) All columns need to be numeric

dnissani-ias commented 5 years ago

I have satisfied all requirements as described above, and am still getting the same error as above. How do you set the content type to csv for XGBoost?

sandys commented 5 years ago

sorry to add a comment here, but is it preferable to use svmlight format (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html) versus CSV ?

mramakrishnan-chwy commented 5 years ago

The confusion is because of not very consistent documentation. Yes, the XGBoost supports both CSV and libsvm. For csv, it worked with s3_input object for me. Here is how I defined it,

train_channel = sagemaker.session.s3_input(_s3_inputtrain, content_type ='csv') valid_channel = sagemaker.session.s3_input(_s3_inputvalidation, content_type ='csv')

_s3_inputtrain and _s3_inputvalidation contains the path to my file in s3 buckets

Now, fit the model with these train and validation s3 file inputs

Ref:https://github.com/aws/sagemaker-python-sdk/issues/133

engineeryashsaxena commented 4 years ago

But what if I am doing it via Sagemaker UI ?

aws / amazon-sagemaker-examples

newbie SageMaker with XGBoost #668