Open winanjaya-mtwi opened 5 years ago
Hello,
CSVs passed to XGBoost need to be in a specific format:
No header row Outcome variable in the first column, features in the rest of the columns (there's no ability to drop them during the training process) All columns need to be numeric
I have satisfied all requirements as described above, and am still getting the same error as above. How do you set the content type to csv for XGBoost?
sorry to add a comment here, but is it preferable to use svmlight format (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html) versus CSV ?
The confusion is because of not very consistent documentation. Yes, the XGBoost supports both CSV and libsvm. For csv, it worked with s3_input object for me. Here is how I defined it,
train_channel = sagemaker.session.s3_input(_s3_inputtrain, content_type ='csv') valid_channel = sagemaker.session.s3_input(_s3_inputvalidation, content_type ='csv')
_s3_inputtrain and _s3_inputvalidation contains the path to my file in s3 buckets
Now, fit the model with these train and validation s3 file inputs
But what if I am doing it via Sagemaker UI ?
Hello All,
I am new to AWS Sagemaker, I try to use XGBoost algorithm but it keeps fail, here are what I have done:
after waiting couple minutes, it failed with the following error:
ClientError: Blankspace and colon not found in firstline '1 4a4fc709a737ab971e7a1008a72a930c ...' of file 'groundtruth_tqOAx9PEb2RJ3xlAiRN5khotlbq4KUZfZKT_IN2m9d0j2W2h_GbHJxlp5UgJcomc4BqO8qnpPZiDNKcnGqcELSNMzOm6dXRpvzJUOeAFgFOvWfUrym_pI8z35vKYEG.hUdmlDAXjj6M5LVinEg0N8rdMPKBfDnHAmj_9THczi2Y-_etags_do_not_modify.tmp'. ContentType by defaullt is in libsvm. Please ensure the file is in libsvm format.
I need a guidance
mytry.zip