aws / sagemaker-tensorflow-extensions

SageMaker specific extensions to TensorFlow.
Apache License 2.0
54 stars 39 forks source link

augmentedManifestFile + PipeModeDataset example #63

Open vlordier opened 4 years ago

vlordier commented 4 years ago

It would really help to have a full end to end example of, say, image classification with augmentedManifestFile + PipeModeDataset

as I keep getting errors of formats like tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input, value: '����

I build a jsonl augumentedManifest with

{'image-ref': s3://path/to/image, 'label': 3} 
{'image-ref': s3://path/to/image, 'label': 1}  
{'image-ref': s3://path/to/image, 'label': 2}  

then preparing training channel as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution    = 'FullyReplicated',
                                        content_type    = 'image/jpeg',
                                        s3_data_type    = 'AugmentedManifestFile',
                                        attribute_names = ['image-ref', 'label'],
                    input_mode      = 'Pipe',
                                        record_wrapping = 'RecordIO') 

and launching the .fit as

data_channels = {'train': train_data}

# Train a model.
tf_estimator.fit(inputs=data_channels, logs=True)

in my entry script, I have

    dataset = PipeModeDataset(channel = channel)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    dataset = dataset.batch(2)
    dataset = dataset.map(combine)
    dataset = dataset.map(example_parser, num_parallel_calls=batch_size)
    dataset = dataset.repeat(epochs)
    dataset = dataset.batch(batch_size, drop_remainder=True)
    image_batch, label_batch = next(iter(dataset))

and as a modified example parser, I have

`def example_parser(exemple1, exemple2):

feat1 = tf.io.parse_single_example(
    exemple1,
    features={
        'image-ref'     : tf.io.FixedLenFeature([], tf.string),
    })

feat2 = tf.io.parse_single_example(
    exemple2,
    features={
        'label'         : tf.io.FixedLenFeature([], tf.int64),
    })

image                   = feat1['image-ref']
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
label                   = tf.cast(feat2['label'], tf.int32)
return image, label

`

What am I doing wrong ? The documentation here is not clear about using augmented manifest files

laurenyu commented 4 years ago

could you show your code for creating the TF estimator and launching the training job as well?

vlordier commented 4 years ago

creating the TF estimator

tf_estimator = TensorFlow( entry_point = train.py, 
                    train_use_spot_instances = False,
                    train_max_wait = 36000,
                    train_max_run= 36000,
                    role    = role,
                    train_instance_count    = 1, 
                    train_instance_type = 'ml.p2.8xlarge',
                    framework_version = '2.0.0', 
                    py_version  = 'py3',
                    input_mode  = 'Pipe',
                    source_dir  = source_dir,
                    hyperparameters     = hyperparameters,
                    distributions   = distributions,
                    )

with distributions as

distributions={
            'mpi': {
                    'enabled': True,
                    'processes_per_host': 8,
                    'custom_mpi_options': '--NCCL_DEBUG INFO'
            }
    }

since I am looking to

Launching the training job with tf_estimator.fit(inputs=data_channels, logs=True)

with data_channels as data_channels = {'training': train_data, 'testing' : test_data, 'validation' : val_datal}

and train_data defined as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution    = 'FullyReplicated',
                                        content_type    = 'image/jpeg',
                                        s3_data_type    = 'AugmentedManifestFile',
                                        attribute_names = ['image-ref', 'label'],
                    input_mode      = 'Pipe',
                                        record_wrapping = 'RecordIO') 

This notebook here is a good starting point to put together a little demo of how to use Pipe Mode on Augmented Manifest Files

nadiaya commented 4 years ago

Have you seen the documentation we have on using augmented manifest files with PipeModeDataset? https://github.com/aws/sagemaker-tensorflow-extensions#using-the-pipemodedataset-with-sagemaker-augmented-manifest-files https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html

nadiaya commented 4 years ago

Marking it with a 'feature request' label for the notebook with a better explanation.

matthewhardern commented 4 years ago

I am also encountering the above issue when trying to decode PNG images.

OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input

Have we got any idea what is causing the above issue?

matthewhardern commented 4 years ago

@vlordier Update for you if you have not found the answer, I removed the features and fed in the example1 / 2 in your case directly although you will need to tf.strings.to_number() on your label. This then worked for me