augmentedManifestFile + PipeModeDataset example

vlordier commented 4 years ago

It would really help to have a full end to end example of, say, image classification with augmentedManifestFile + PipeModeDataset

as I keep getting errors of formats like tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input, value: '��

I build a jsonl augumentedManifest with

{'image-ref': s3://path/to/image, 'label': 3} 
{'image-ref': s3://path/to/image, 'label': 1}  
{'image-ref': s3://path/to/image, 'label': 2}

then preparing training channel as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution    = 'FullyReplicated',
                                        content_type    = 'image/jpeg',
                                        s3_data_type    = 'AugmentedManifestFile',
                                        attribute_names = ['image-ref', 'label'],
                    input_mode      = 'Pipe',
                                        record_wrapping = 'RecordIO')

and launching the .fit as

data_channels = {'train': train_data}

# Train a model.
tf_estimator.fit(inputs=data_channels, logs=True)

in my entry script, I have

    dataset = PipeModeDataset(channel = channel)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    dataset = dataset.batch(2)
    dataset = dataset.map(combine)
    dataset = dataset.map(example_parser, num_parallel_calls=batch_size)
    dataset = dataset.repeat(epochs)
    dataset = dataset.batch(batch_size, drop_remainder=True)
    image_batch, label_batch = next(iter(dataset))

and as a modified example parser, I have

`def example_parser(exemple1, exemple2):

feat1 = tf.io.parse_single_example(
    exemple1,
    features={
        'image-ref'     : tf.io.FixedLenFeature([], tf.string),
    })

feat2 = tf.io.parse_single_example(
    exemple2,
    features={
        'label'         : tf.io.FixedLenFeature([], tf.int64),
    })

image                   = feat1['image-ref']
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
label                   = tf.cast(feat2['label'], tf.int32)
return image, label

`

What am I doing wrong ? The documentation here is not clear about using augmented manifest files

laurenyu commented 4 years ago

could you show your code for creating the TF estimator and launching the training job as well?

vlordier commented 4 years ago

creating the TF estimator

tf_estimator = TensorFlow( entry_point = train.py, 
                    train_use_spot_instances = False,
                    train_max_wait = 36000,
                    train_max_run= 36000,
                    role    = role,
                    train_instance_count    = 1, 
                    train_instance_type = 'ml.p2.8xlarge',
                    framework_version = '2.0.0', 
                    py_version  = 'py3',
                    input_mode  = 'Pipe',
                    source_dir  = source_dir,
                    hyperparameters     = hyperparameters,
                    distributions   = distributions,
                    )

with distributions as

distributions={
            'mpi': {
                    'enabled': True,
                    'processes_per_host': 8,
                    'custom_mpi_options': '--NCCL_DEBUG INFO'
            }
    }

since I am looking to

stream a dataset of compressed jpg from S3 using Pipemodedataset and Augmented Manifest File
decode and augment it on the fly
feed it to multiple GPU with horovod for training

Launching the training job with tf_estimator.fit(inputs=data_channels, logs=True)

with data_channels as data_channels = {'training': train_data, 'testing' : test_data, 'validation' : val_datal}

and train_data defined as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution    = 'FullyReplicated',
                                        content_type    = 'image/jpeg',
                                        s3_data_type    = 'AugmentedManifestFile',
                                        attribute_names = ['image-ref', 'label'],
                    input_mode      = 'Pipe',
                                        record_wrapping = 'RecordIO')

This notebook here is a good starting point to put together a little demo of how to use Pipe Mode on Augmented Manifest Files

nadiaya commented 4 years ago

Have you seen the documentation we have on using augmented manifest files with PipeModeDataset? https://github.com/aws/sagemaker-tensorflow-extensions#using-the-pipemodedataset-with-sagemaker-augmented-manifest-files https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html

nadiaya commented 4 years ago

Marking it with a 'feature request' label for the notebook with a better explanation.

matthewhardern commented 4 years ago

I am also encountering the above issue when trying to decode PNG images.

OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input

Have we got any idea what is causing the above issue?

matthewhardern commented 4 years ago

@vlordier Update for you if you have not found the answer, I removed the features and fed in the example1 / 2 in your case directly although you will need to tf.strings.to_number() on your label. This then worked for me

aws / sagemaker-tensorflow-extensions

augmentedManifestFile + PipeModeDataset example #63