aws / sagemaker-tensorflow-extensions

SageMaker specific extensions to TensorFlow.
Apache License 2.0
54 stars 39 forks source link

PipeMode on local instance #47

Open fmannhardt opened 5 years ago

fmannhardt commented 5 years ago

Trying to call my script using PipeMode on a local instance always fails since the script receives the dataset in File mode. Is PipeMode as supported on local instances?

ChoiByungWook commented 5 years ago

Hello @fmannhardt,

Apologies for the late response and inconvenience.

PipeMode isn't supported with "local" instances and only in SageMaker.

fmannhardt commented 5 years ago

Would be great to have that limitation documented somewhere prominently. This wastes some hours of attempts.

How are we supposed to work around this? I am now reading the configuration and depending on the mode, I handle the difference so the script can work with both modes.

ChoiByungWook commented 5 years ago

Hello @fmannhardt,

I agree with you.

Where do you believe this documentation should live, so that it is easily found by other users?

I apologize for the frustration.

Can you clarify what you mean by working around this, usually local mode and modes in SageMaker should be similar unless you're working in a distributed environment, since there is no support for distributed in local mode.

Thanks!

laurenyu commented 5 years ago

I've submitted a PR to document this: https://github.com/aws/sagemaker-python-sdk/pull/1019

in the meantime, I will leave this open as a feature request. Pipe Mode for Local Mode has been in our backlog since Local Mode's launch, and we're always reprioritizing our backlog based on feedback.

fmannhardt commented 5 years ago

@ChoiByungWook good question. From an outside look of the Sagemaker documentation, one issue is that it is all over the place. Just a small collection of what I encountered:

Since I am working with Tensorflow, I would have expected the warning here: https://sagemaker.readthedocs.io/en/stable/using_tf.html#training-with-pipe-mode-using-pipemodedataset and here https://sagemaker.readthedocs.io/en/stable/overview.html?highlight=local%20mode#local-mode

Also that half of the parameters / examples are for the old "non-script" mode and not applicable anymore does not make using Sagemaker easier. Led to much confusion on my side. Probably the downside of flexibility and keeping the same API.

fmannhardt commented 5 years ago

Can you clarify what you mean by working around this, usually local mode and modes in SageMaker should be similar unless you're working in a distributed environment, since there is no support for distributed in local mode.

Currently I added code like this:

def load_data_as_dataset(channel_name, channel, data_config):

    def get_filenames(channel):
        if ("," in channel):
            return(",".split(channel))
        elif (not (channel.endswith(".tfrecord"))):
            return(glob.glob(channel + "/*.tfrecord"))
        else:
            return(channel)

    mode = data_config[channel_name]['TrainingInputMode']    

    logging.info("Running {} in mode: {}".format(channel_name, mode))

    if mode == 'Pipe':
        # Construct a `PipeModeDataset` reading from a 'training' channel, using
        # the TF Record encoding.        
        from sagemaker_tensorflow import PipeModeDataset
        ds = PipeModeDataset(channel=channel_name, record_format='TFRecord')
    else:    
        filenames = get_filenames(channel)
        logging.info("Loading files {}".format(filenames))
        ds = tf.data.TFRecordDataset(filenames) 

    return ds

But I cannot really use PipeMode anyway due to issue #46

athewsey commented 4 years ago

+1 for the feature request: Using pipe mode often (unless you happened to be using MXNet on a RecordIO file already...) involves a significant I/O change for a script, and can be tricky to get working if you're dealing with non-trivial data types (e.g. augmented manifests, object detection). Adding local mode support could really cut the debugging time / improve likelihood of users actually getting Pipe Mode working 😭

hwaxxer commented 3 years ago

It's fascinating that almost 2 years have passed since this issue was created and pipe mode is still not possible to use outside of a training job. It makes it 1) awkward and difficult to debug training pipelines, and 2) creates inconsistencies between training and evaluation (custom scripts to parse manifest files).

Also suffering from https://github.com/aws/sagemaker-tensorflow-extensions/issues/46 and it makes me wonder if pipe mode is actively being worked on/maintained?