Open fiocam opened 4 years ago
Hi @fiocam, are you running your training job with your own container? (img_name = ecr.describe_repositories(repositoryNames=['python3repo'])['repositories'][0]['repositoryUri'] + ':latestfiona'
)
Could you share your training logs as well as your container dockerfile?
Hi Chuyang Deng,
Thanks for you answer! Yes I'm using my own container. The training and docker file are already contained in the original post. But I'll just put them here again:
# Select image to use
FROM python:3.6
# Adding requirements file to current directory
COPY docker/requirements.txt .
# Install all packages defined in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copies the training code inside the container
COPY algorithm.py /opt/ml/algorithm.py
# Define entrypoint
ENTRYPOINT ["python3.6", "/opt/ml/algorithm.py"]
#!/usr/bin/env python
# coding: utf-8
# # Sample Algorithm
# Import packages
import boto3
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
import json
import os
import signal
import sys
import time
import pickle
# Create KMS client
kms = boto3.client('kms', region_name='eu-west-1')
# Setup parameters
# Container directories
input_dir = '/opt/ml/input'
model_dir = '/opt/ml/model'
output_dir = '/opt/ml/output'
#channel name for training
channel_name = 'train'
training_path = os.path.join(input_dir, channel_name)
failure_path = output_dir + '/failure'
def arima_algo():
try:
#read training data
train_data = pd.read_csv(os.path.join(training_path,r'/train_data.csv'))
#carry out arima model
model = SARIMAX(train_data['y'], order=(2,1,2), enforce_invertibility=False)
model_fit = model.fit()
prediction = pd.DataFrame({'ds': pd.date_range(start='2019-01-01', periods=4, freq='MS'),
'yhat': model_fit.predict(47, 50)}).reset_index(drop=True)
#save model
#json.dump(model, open(model_dir + '/sarimax_model.json', 'w'))
model_json = model.to_json()
with open(os.path.join(model_dir,"/model.json"), "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
#save predictions
prediction.to_csv(os.path.join(output_dir,r'prediction.csv'))
except Exception:
print('Failed to train: %s' % (sys.exc_info()[0]))
touch(failure_file)
raise
if __name__ == '__main__':
arima_algo()
Same issue. Would love an update.
@fiocam I know this doesn't answer your question, but maybe take a look at the python Script-mode example. I've been using something similar to this example successfully for a few months now. More info here on SageMaker Training Toolkit.
basically only /opt/ml/model content will be packed as model.tar.gz and saved to the output_path you specified. So make sure that the needed files are saved to this path, the /opt/ml/output is the folder to indicate whether your training is successful or not.
I am trying to train my own model on AWS Sagemaker in order to eventually create an endpoint, basically following this tutorial: https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/r_bring_your_own , but using Python code for the model. When executing the create_training_command no error message comes up and a training job is created in Sagemaker. However, no files are stored in S3 model or output directory. When clicking on the link that should lead to the model.tar.gz file in the training job directory, this folder is also empty. I have included my docker, algorithm.py and .ipynb file.
Any help is greatly appreciated!
Dockerfile
algorithm.py
Extract of .ipynb file
Publish container
Train