mxnet gluon recommender batch prediction?

Hi @tosolveit thank you for your interest in using SageMaker. If you can please provide more details about your use case, we can provide more detailed advice. Some relevant questions I would like to know:

How is your recommender model implemented? Did you use a framework container provided by SageMaker (like sagemaker-mxnet-container)? Or did you build your own custom container with train and serve entrypoints?
How many test examples (user-item pairs) would you expect to process in a single batch prediction? Do you want recommendations for just a few examples, or more than 100 at a time?
What latency or SLA do you expect for batch predictions? Do you need real-time recommendations within milliseconds-to-seconds latency? Or can you afford minutes-to-hours latency for processing a large dataset?

Depending on your requirements, you can use either SageMaker Endpoints or Batch Transform for your inference use case.

If you want small inference requests handled with fast latency, you can deploy your model to a real-time Endpoint and process data by calling InvokeEndpoint. For more info see "How it Works: Hosting" in the SageMaker documentation.
If you want large, dataset-scale inference requests handled in an asynchronous way, you can deploy your model as a Batch Transform job by calling CreateTransformJob. For more info see "How it Works: Batch Transform" in the SageMaker documentation.

aws / amazon-sagemaker-examples

mxnet gluon recommender batch prediction? #709