Open joshwapiano opened 6 years ago
@joshwapiano, I guess something happened with the data (it's none
), and it might be caused by multiple problems.
Can you go back to your tiles folder that created by Label Maker, and check if every tile were written correctly? We've seen if the MapBox imagery API token is not set up correctly, the image tiles can be blank.
For this problem, can you just check with the function prep_data
(run it in your notebook cell separately with find_file
function) if it prints out the correct data input shape? The shape of the data should be ((1831, 3, 256, 256), (458, 3, 256, 256), (1831,), (458,)) for X_train, X_test, Y_train and Y_test.
If you think the first two were not the problems.
You did not mention you have set up an S3 bucket, so I'm just guessing it can cause a problem too. I remember I saved data.npz in an s3 bucket, and give the s3 URL in the second cell mxnet_estimator.fit
, where you found the error you mentioned above. It was not a problem that read data.npz from the root directory, but SageMaker continues being updated by the AWS team, so I'm not sure if you're assumed to feed data through S3 now. For details see here.
Let me know if we can help further.
@Geoyi Thanks for getting back to me. I had written a much longer response, but for some reason GitHub has not saved this comment.
Essentially I have investigated both problem 1 and problem 2 and both give the expected results.
I have a feeling that the issue is with the S3 Bucket, and have tried multiple approaches on this. None of which have been successful. Would you be able to run the example on your own sagemaker notebook instance and if it functions as expected share the syntax/approach used in the mxnet_estimator.fit
argument? Or any other changes you make?
Many thanks
I'm getting this same problem.
re problem 1: I downloaded the jpg files from label_maker and there are satellite images. Some are ocean tiles. Could this be affecting the model?
re problem 2 : I use S3 but only from copying examples; so I am unfamiliar with basic stuff, your configuration, as well as S3 + SageMaker. Would a good S3 URL look like: s3://sagemaker-data-npz/data.npz, or just s3://sagemaker-data-npz, or do these both look wrong to you? Was I supposed to do some previous work on IAM keys or make the bucket public?
I have used notebooks before. Where would I run or insert prep_data(find_file( ... ))
to test that part correctly?
@joshwapiano and @mapmeld, I will spin up the sagemaker notebook and give a check next week. Let me know if you solve the problem before I get back to you. Sorry for the delay.
@Geoyi thanks for getting back to us - still not having any luck producing the correct data format/feed for the sagemaker notebook. I think they have made changes to sagemaker/mxnet without providing adequate documentation. Good luck, looking forward to hearing from you.
Phewwww, I finally solved the problem and took me a whole morning today, @joshwapiano, and @mapmeld.
You're right about S3 bucket and prep_data(find_file( ... ))
, @mapmeld. I deleted find_file( ... )
function. And @joshwapiano, SageMaker team dosen't do a good job of documenting their work.
Additional things I did:
conda-mxnet_p27
instead of their python 3.6. Appearantly, the MXNet
only works with python 2.7 and 3.5, so their conda-mxnet_p36
won't work. It gave me no sagemaker.mxnet module
error.'NoneType' object has no attribute 'read'
error you originally got was caused by the data could not be parsed to the model correctly from S3 bucket. Other people recently have this problem with reading data from AWS private S3 bucket. Here are the scripts to replace the scripts in this notebook:
%%file mx_lenet_sagemaker.py
### replace this to the first cell
import logging
from os import path as op
import os
import mxnet as mx
import numpy as np
import boto3
batch_size = 64
num_cpus = 0
num_gpus = 1
s3_url = "Your_s3_bucket_URL"
s3_client = boto3.client('s3')
s3_client.download_file('Buket-name', "data.npz", "data.npz")
def prep_data():
"""
Convert numpy array to mx Nd-array.
Parameters
----------
path: the directory that save data.npz.
"""
data_file = np.load(op.join(os.getcwd(), 'data.npz'))
x_train = data_file['x_train']
y_train = data_file['y_train'][:,:1] ## only take the second column of y_train
x_test = data_file['x_test']
y_test = data_file['y_test'][:,:1]
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
print(x_train.shape, x_train.mean())
img_mean = np.mean(x_train, axis=(0, 1, 2))
img_std = np.std(x_train, axis=(0, 1, 2))
x_train -= img_mean
x_train /= img_std
x_test -= img_mean
x_test /= img_std
img_rows = 256
img_cols = 256
x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols) ## reshape it to (448, ) instead of (448,1)
x_test = x_test.reshape(x_test.shape[0], 3, img_rows, img_cols)
y_train = y_train.reshape(y_train.shape[0], )
y_test = y_test.reshape(y_test.shape[0], )
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
train_iter = mx.io.NDArrayIter(x_train, y_train, batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(x_test, y_test, batch_size)
return train_iter, val_iter
def mx_lenet():
"""Building a three layer LeNet sytle Convolutional Neural Net using MXNet."""
data = mx.sym.var('data')
data_dp = mx.symbol.Dropout(data, p = 0.2) ## 20% of the input that gets dropped out during training time
# first conv layer
conv1 = mx.sym.Convolution(data=data_dp, kernel=(5, 5), num_filter=20)
tanh1 = mx.sym.Activation(data=conv1, act_type="tanh")
pool1 = mx.sym.Pooling(data=tanh1, pool_type="max", kernel=(2, 2), stride=(2, 2))
# second conv layer
conv2 = mx.sym.Convolution(data=pool1, kernel=(5, 5), num_filter=50)
tanh2 = mx.sym.Activation(data=conv2, act_type="tanh")
pool2 = mx.sym.Pooling(data=tanh2, pool_type="max", kernel=(2, 2), stride=(2, 2))
# third conv layer
conv3 = mx.sym.Convolution(data=pool1, kernel=(5, 5), num_filter=50)
tanh3 = mx.sym.Activation(data=conv2, act_type="tanh")
pool3 = mx.sym.Pooling(data=tanh2, pool_type="max", kernel=(2, 2), stride=(2, 2))
# first fullc layer
flatten = mx.sym.flatten(data=pool3)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500)
tanh4 = mx.sym.Activation(data=fc1, act_type="tanh")
# second fullc
fc2 = mx.sym.FullyConnected(data=tanh4, num_hidden=2)
# softmax loss
return mx.sym.SoftmaxOutput(data=fc2, name='softmax')
def train(num_cpus, num_gpus, **kwargs):
"""
Train the image classification neural net.
Parameters
----------
num_cpus: If train the model on an aws GPS machine, num_cpus = 0 and num_gpus = 1, vice versa.
num_gpus: apply to the same rule above
"""
train_iter, val_iter = prep_data()
lenet = mx_lenet()
lenet_model = mx.mod.Module(
symbol=lenet,
context=get_train_context(num_cpus, num_gpus))
logging.getLogger().setLevel(logging.DEBUG)
lenet_model.fit(train_iter,
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate': 0.1},
eval_metric='acc',
batch_end_callback=mx.callback.Speedometer(batch_size, 16),
num_epoch=100)
return lenet_model
def get_train_context(num_cpus, num_gpus):
"""
Define the model training instance.
Parameters
----------
num_cpus: If train the model on an aws GPS machine, num_cpus = 0 and num_gpus = 1, vice versa.
num_gpus: apply to the same rule above
"""
if num_gpus > 0:
return mx.gpu()
return mx.cpu()
def get_train_context(num_cpus, num_gpus):
if num_gpus > 0:
print("It's {} instance".format(num_gpus))
return mx.gpu()
print("It's {} instance".format(num_cpus))
return mx.cpu()
and do this to the second cell:
%%time
from sagemaker.mxnet import MXNet
from sagemaker import get_execution_role
s3_url = "Your_s3_bucket_URL"
mxnet_estimator = MXNet("mx_lenet_sagemaker.py",
role=get_execution_role(),
output_path= s3_url,
train_instance_type="ml.p2.xlarge",
train_instance_count=1)
mxnet_estimator.fit(s3_url)
@Geoyi this works for me - thank you so much for fixing this!
@Geoyi Many thanks for providing this! Looking forward to trying it out. Will let you know how I get on! I've come across an issue with the labelling that label-maker is producing, will raise in separate #issue.
I've been following the walkthough found here (albeit with a smaller bounding box), and have initiated a Sagemaker Notebook instance. The data.npz file is sitting in the sagemaker folder, and I'm having no problem reading it when running the relevant sections of mx_lenet_sagemaker.py in a new notebook on the instance, however when I run the second cell of SageMaker_mx-lenet I hit the following error:
After several hours trying different fixes I'm having little to no luck debugging, but was hoping you could check the example to ensure it runs fine when you attempt it?