aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.98k stars 6.73k forks source link

newbie SageMaker with XGBoost #668

Open winanjaya-mtwi opened 5 years ago

winanjaya-mtwi commented 5 years ago

Hello All,

I am new to AWS Sagemaker, I try to use XGBoost algorithm but it keeps fail, here are what I have done:

after waiting couple minutes, it failed with the following error:

ClientError: Blankspace and colon not found in firstline '1 4a4fc709a737ab971e7a1008a72a930c ...' of file 'groundtruth_tqOAx9PEb2RJ3xlAiRN5khotlbq4KUZfZKT_IN2m9d0j2W2h_GbHJxlp5UgJcomc4BqO8qnpPZiDNKcnGqcELSNMzOm6dXRpvzJUOeAFgFOvWfUrym_pI8z35vKYEG.hUdmlDAXjj6M5LVinEg0N8rdMPKBfDnHAmj_9THczi2Y-_etags_do_not_modify.tmp'. ContentType by defaullt is in libsvm. Please ensure the file is in libsvm format.

I need a guidance

mytry.zip

JavierLopezT commented 5 years ago

Hello,

CSVs passed to XGBoost need to be in a specific format:

No header row Outcome variable in the first column, features in the rest of the columns (there's no ability to drop them during the training process) All columns need to be numeric

dnissani-ias commented 5 years ago

I have satisfied all requirements as described above, and am still getting the same error as above. How do you set the content type to csv for XGBoost?

sandys commented 5 years ago

sorry to add a comment here, but is it preferable to use svmlight format (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html) versus CSV ?

mramakrishnan-chwy commented 5 years ago

The confusion is because of not very consistent documentation. Yes, the XGBoost supports both CSV and libsvm. For csv, it worked with s3_input object for me. Here is how I defined it,

train_channel = sagemaker.session.s3_input(_s3_inputtrain, content_type ='csv') valid_channel = sagemaker.session.s3_input(_s3_inputvalidation, content_type ='csv')

_s3_inputtrain and _s3_inputvalidation contains the path to my file in s3 buckets

Now, fit the model with these train and validation s3 file inputs

Ref:https://github.com/aws/sagemaker-python-sdk/issues/133

engineeryashsaxena commented 4 years ago

But what if I am doing it via Sagemaker UI ?