Closed miguelvr closed 5 years ago
Following. Would be nice to have a similar description of how batching is handled like for tensorflow serving.
@miguelvr MMS support automatic batch the HTTP request. You can use management API to register a model with batch support: https://github.com/awslabs/mxnet-model-server/blob/master/docs/management_api.md#register-a-model
If you enabled batch mode for your model, your custom service code will receive batched request. It's your service code's responsibility to batch then and send the batch request to engine. The example code didn't implement batch.
The the MTCNN case, it's different from batch feature the MMS is supporting. MMS didn't restrict what you can do in your service code. You can achieve what you need by override the handle function.
@frankfliu thanks for the quick response! That's what I suspected.
It would be nice however to have this feature highlighted somewhere as it is common practice with most model servers
@miguelvr MMS support automatic batch the HTTP request. You can use management API to register a model with batch support: https://github.com/awslabs/mxnet-model-server/blob/master/docs/management_api.md#register-a-model
If you enabled batch mode for your model, your custom service code will receive batched request. It's your service code's responsibility to batch then and send the batch request to engine. The example code didn't implement batch.
The the MTCNN case, it's different from batch feature the MMS is supporting. MMS didn't restrict what you can do in your service code. You can achieve what you need by override the handle function.
@frankfliu If we use the model-archiver utility to package our models, how should we set batch_size and max_batch_delay? Via the management API? If so, it is peculiar that they are not available for setting by the model archiver.
@radao In current implementation, management API is the only way to set batch size. Exposing more options in model-archiver tool is on our road map. This feature will come together with new model archive format spec.
We are working on document about batch support and will provide example models that support batch.
@frankfliu should we add an example show casing how to use batched requests in your request handler code, and showcasing the performance boost? @vdantu as an FYI.
@lupesko that would be very helpful
I think this is definitely needed. I can look into writing an example with batching.
Thanks, Vamshi
We do have a story for working on this in our team's internal backlog, will try to prioritize it for one of the upcoming sprints.
@frankfliu hi, may I ask how to change the model batch_size after starting the mms in the docker? when the mms starts in the docker, the model registered automatically, if I post a register request like:
curl -v -X POST "http://localhost:8086/models?url=resnet.mar&model_name=image_retrieval&batch_size=30"
It'll return a message: "message": "Model image_retrieval is already registered."
.
If I DELETE the register model, and register model with command:
curl -v -X POST "http://localhost:8086/models?initial_workers=1&synchronous=true&url=resnet.mar"
The model registered, but still can't change the batch_size, even I PUT a request:
curl -v -X PUT "http://localhost:8086/models/resnet?batch_size=30&synchronous=true"
it'll return:
* Trying ::1...
* Connected to localhost (::1) port 8086 (#0)
> PUT /models/image_retrieval?batch_size=30&synchronous=true HTTP/1.1
> Host: localhost:8086
> User-Agent: curl/7.49.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: 2bf616da-bab8-40d7-86d6-a3893a55c4cd
< content-length: 33
< connection: keep-alive
<
{
"status": "Workers scaled"
}
* Connection #0 to host localhost left intact
@JustinhoCHN, could you please share what is the output of curl "http://localhost:8086/models/resnet"
after you tried to update its batch_size with POST request? I am trying to see what leads you to conclude that batch size is not updated.
Also, as a side note - you can set batch_size during register call (POST to /models endpoint). Consecutive calls to scale workers (PUT to /models/
@ddavydenko Thank you for your quick reply. Here is the return info after the POST request (I just post the request after the container restart, not after DELETE the register model).
curl -v -X POST "http://localhost:8086/models?url=resnet.mar&model_name=image_retrieval&batch_size=30"
* Trying ::1...
* Connected to localhost (::1) port 8086 (#0)
> POST /models?url=resnet.mar&model_name=image_retrieval&batch_size=30 HTTP/1.1
> Host: localhost:8086
> User-Agent: curl/7.49.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< content-type: application/json
< x-request-id: 921e4a3e-c882-47e1-907f-caeb1e0e525b
< content-length: 112
<
{
"code": 400,
"type": "BadRequestException",
"message": "Model image_retrieval is already registered."
}
* Connection #0 to host localhost left intact
Here's curl "http://localhost:8086/models/image_retrieval"
output:
{
"modelName": "image_retrieval",
"modelUrl": "resnet.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 1,
"maxBatchDelay": 100,
"workers": [
{
"id": "9000",
"startTime": "2019-01-30T01:16:36.394Z",
"status": "READY",
"gpu": true,
"memoryUsage": 1950052352
}
]
}
If I DELETE the register model and POST again:
curl -v -X POST "http://localhost:8086/models?initial_workers=1&batch_size=30&synchronous=true&url=resnet.mar&model_name=image_retrieval"
* Trying ::1...
* Connected to localhost (::1) port 8086 (#0)
> POST /models?initial_workers=1&batch_size=30&synchronous=true&url=resnet.mar&model_name=image_retrieval HTTP/1.1
> Host: localhost:8086
> User-Agent: curl/7.49.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< content-type: application/json
< x-request-id: 116d61f2-f362-4006-8a2d-a106e977b2ab
< content-length: 120
<
{
"code": 404,
"type": "ModelNotFoundException",
"message": "Load failed... Deregistered model image_retrieval"
}
* Connection #0 to host localhost left intact
So I want to register again, I can only POST curl -v -X POST "http://localhost:8086/models?initial_workers=1&synchronous=true&url=resnet.mar
without batch_size
and other argument.
{
"modelName": "resnet",
"modelUrl": "resnet.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 1,
"maxBatchDelay": 100,
"workers": [
{
"id": "9002",
"startTime": "2019-01-30T02:22:44.951Z",
"status": "READY",
"gpu": true,
"memoryUsage": 1854517248
}
]
}
Ok, I think what's happening when you DELETE model (unregister it by posting DELETE to /models/
So I think that is why your register attempt right after DELETE call fails. In order to do it cleanly, I suggest you DELETE model after container bounced and then register it by referring to url of the model where it actually exists (maybe on S3) so that MMS would have a strong link to where it could download model from. During this same POST call you can specify batch_size param.
I admit this is not the best experience for MMS users, but hopefully this workaround could be sufficient for now. Please let me know if this approach works for you.
After some more digging around and reproducing issue I can confirm this is actually a bug with MMS. The call to register model with batch_size param works only if there is no synchronous parameter specified. So after DELETE of the model call like this: curl -X POST "localhost:8081/models?url=resnet50_ssd.mar&model_name=resnet50&batch_size=8&initial_workers=4"
works fine, but call like this curl -X POST "localhost:8081/models?url=resnet50_ssd.mar&model_name=resnet50&batch_size=8&initial_workers=4&synchronous=true"
fails with { "code": 404, "type": "ModelNotFoundException", "message": "Load failed... Deregistered model resnet50" }
We will prioritize this as a fix in near future. In the meantime you can use async approach to register model in order to change its batch size.
@ddavydenko Great! I'll try your suggestion and will let you know my result later. You guys are so effiecient! Thanks again!
so I remove the synchronous params from the POST request, the model registered successfully, but the model seems not loading:
{
"modelName": "image_retrieval",
"modelUrl": "resnet_triplet.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 30,
"maxBatchDelay": 100,
"workers": [
{
"id": "9002",
"startTime": "2019-01-30T06:08:29.942Z",
"status": "UNLOADING",
"gpu": true,
"memoryUsage": 0
}
]
}
status: unloading. MMS didn't start the model service, and didn't use any gpu memory. How to start the worker and make it load the model again? @ddavydenko
Hm, this might be even bigger issue than initially thought. Let us investigate on our side and comment on findings within a day or two. Sorry for the inconvenience :(
— Thanks, Denis
On Jan 29, 2019, at 10:12 PM, Justin.h notifications@github.com wrote:
so I remove the synchronous params from the POST request, the model registered successfully, but the model seems not loading:
{ "modelName": "image_retrieval", "modelUrl": "resnet_triplet.mar", "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 30, "maxBatchDelay": 100, "workers": [ { "id": "9002", "startTime": "2019-01-30T06:08:29.942Z", "status": "UNLOADING", "gpu": true, "memoryUsage": 0 } ] } status: unloading. MMS didn't start the model service, and didn't use any gpu memory. How to start the worker and make it load the model again? @ddavydenko
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I would also like to understand the behavior of MMS batching as it is not clear from the little description provided.
When setting a specific value X for batch size, does it mean the model server will only accept batch sizes of X or is it up to X, with X being an upper bound?
My goal is to have some servers accepting a variable number of samples at the same time, depending of what is available to be processed, so having an upper bound would be useful while having a fixed batch size, not so much.
@JustinhoCHN and @ddavydenko : I am unable to recreate the issue with initial workers and batch size configured. I did the following
/tmp/model-store
foldermxnet-model-server --model-store /tmp/model-store
curl -X POST localhost:8081/models?url=squeezenet_v1.1.mar\&batch_size=30\&max_batch_delay=40\&initial_workers=1\&synchronous=true
It seems to load fine
curl localhost:8081/models/squeezenet_v1.1
{
"modelName": "squeezenet_v1.1",
"modelVersion": "1.0",
"modelUrl": "squeezenet_v1.1.mar",
"engine": "MXNet",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 30,
"maxBatchDelay": 40,
"workers": [
{
"id": "9000",
"startTime": "2019-01-30T07:45:28.133Z",
"status": "READY",
"gpu": false,
"memoryUsage": 131043328
}
]
}
@JustinhoCHN : If you are seeing that the model is in "UNLOADING" state, that means your model wasn't loaded properly. Look at the mms_log.log
in the logs folder. It would have logs of what the issue is. Or, share your log file so that we can also take a look and try to help out.
@miguelvr : We are in the process of writing a document. It will be out shortly. But, for now let me try to answer your questions here,
When setting a specific value X for batch size, does it mean the model server will only accept batch sizes of X or is it up to X, with X being an upper bound?
Short Answer : It is upto X messages. You get to choose the batch_size and the max_batch_delay for every model you register. Currently there is no way to configure these two params at startup. The max_batch_delay timer for a particular model starts when the model server receives the first inference request for that model and it keeps getting rescheduled as long as there are inference requests in the queue. If you receive X messages before max_batch_delay time expires, then the timer is switched off and the backend worker is given all the X messages. If you receive X-delta messages before max_batch_delay timeout, then the backend worker is given only X-delta messages.
@vdantu I'm afraid it didn't work, I tried again and still the same. BTW I'm using Docker, maybe you should try the docker version. Here's the log: @vdantu @ddavydenko mms_log.log
I tried with docker container 1.0.1 CPU as well, and it works. Could you let us know what container are you using? Also, can you share your log file?
Thanks, Vamshi
On Thu, Jan 31, 2019, 1:29 AM Justin.h <notifications@github.com wrote:
@vdantu https://github.com/vdantu I'm afraid it didn't work, I tried again and still the same. BTW I'm using Docker, maybe you should try the docker version.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/mxnet-model-server/issues/732#issuecomment-459276394, or mute the thread https://github.com/notifications/unsubscribe-auth/AiiLNEad2GrYAnehtDV6UpUp2L7dyA4Mks5vIreEgaJpZM4aPNsV .
@vdantu Sorry for the late reply, these days are Chinese lunar new year, happy Chinese new year! I'm using the 1.0.0 GPU docker container, here's my log: mms_log.log
hi @ddavydenko , any progress for now? @vdantu , I tried the 1.0.1 CPU container, still the same problem that mentioned above, can you provide your command detail? I ran the docker container like:
sudo docker run -itd --name mms -p 8087:8087 -p 8088:8088 -v \
/home/huzhihao/projects/image_retrieval_for_supply_chain/models/:/models \
awsdeeplearningteam/mxnet-model-server:1.0.1-mxnet-cpu \
mxnet-model-server --start \
--mms-config /models/config.properties \
--model-store=/models \
--models image_retrieval=resnet_triplet.mar
I'm using shared volume which contains config.properties and .mar file, if I register the model with:
curl -v -X POST localhost:8088/models?url=resnet_triplet.mar\&model_name=image_retrieval\&batch_size=30\&max_batch_delay=40\&initial_workers=1\&synchronous=true
"Load failed... Deregistered model image_retrieval" 404 error raised.
If I register the model with:
curl -v -X POST localhost:8088/models?url=resnet_triplet.mar\&model_name=image_retrieval\&batch_size=30\&max_batch_delay=40\&initial_workers=1
"Model image_retrieval is already registered."
If I delete the model and register again with:
curl -v -X POST localhost:8088/models?url=resnet_triplet.mar\&model_name=image_retrieval\&batch_size=30\&max_batch_delay=40\&initial_workers=1
logs will output:
AssertionError: Batch is not supported.
Load model failed: image_retrieval, error: Worker died.
W-9044-image_retrieval State change WORKER_STARTED -> WORKER_STOPPED
Retry worker: 9044 in 8 seconds.
@JustinhoCHN : I went through the logs you had sent. So the example models in our model zoo don't support batching by default. I am in the process of writing a more tutorials on how to write batching code. But, in the mean time to unblock you,
MMS sends a list of inputs to the preprocess, seen here as parameter named batch. Your model code should use this list of input requests to preprocess and send it to the inference method.
Thank you @vdantu and mms team, that's what we want, it really helps! Few days ago I stuck in how to handle number of requests which less than a batch, your padding method solved this problem! Thanks again!
@JustinhoCHN : Thats awesome :) . Really glad that the batching work we did helped. I could think of two options for variable batch size
Since this logic is in the inference path , i.e., re-bind would have to be done for every batch of request that comes in, it can quickly become very inefficient :) . So I chose to go with "padding with 0's" solution.
@radao In current implementation, management API is the only way to set batch size. Exposing more options in model-archiver tool is on our road map. This feature will come together with new model archive format spec.
We are working on document about batch support and will provide example models that support batch.
@frankfliu Is there a way to contribute to configuring the batch size through the model spec? I'd be happy to help.
@erandagan , awesome to see your activities, keep 'em coming! :)
As for this particular question - could you please clarify what exactly you are trying to achieve? Describing it from user perspective would probably be the best way.
@ddavydenko Sure.
We're deploying a (set of) MMS servers using K8s, and it would be more friendly to operate if we could configure the batch settings directly within the model, rather than POSTing the model along with it's settings after deployment.
This makes things a lot easier from an operational standpoint, as POSTing to the model server is actually mutating the server's state. So, currently, if the server were to crash and K8s would spawn a new server pod, we'd have to reconfigure that pod again as it would boot into a "wrong" state. In contrast, if the batch size was set in the model archive, the server would boot directly into the desired state.
@erandagan : Thats a good feature to have as well. Could you have share a short writeup on what changes you were proposing?
IMO, there is a model section in the model-archive (https://github.com/awslabs/mxnet-model-server/blob/master/model-archiver/model_archiver/manifest_components/model.py). This could be a good place for "max_batch_size", "max_batch_delay". Frontend "Model.java" already has a copy of the ModelArchive, which contains Manifest with this Model.
@vdantu I'll follow up on #770
Hey guys,
I'm setting up a face detection service and am currently using your SSD example as a baseline.
I've noticed that you have an assert here https://github.com/awslabs/mxnet-model-server/blob/aa8f8a3473e6f06780ffd362fb8722a65affd380/examples/model_service_template/mxnet_model_service.py#L51 checking if the batch size is equal to 1
Does this mean I can not process multiple files at the same time?
More precisely, I would like to run an MTCNN service that detects all the faces in an image and then calll a Face Embedding service that computes all the embeddings of said faces.
Is it possible to batch the faces detected and compute the embeddings in one go?
I would like to know if it is just a matter of overloading the correct methods, ot if MMS does something clever in the background.
Thanks, Miguel