I'm trying to run tensorflow-deeplab-v3 model on a server to segmentate images that I send. Everything works fine but the problem is every time I send an image the model looks for GPU and creates a new GPU device and this process of device creation costs around 10 seconds for each image that I send. How can I prevent the model from creating device every time and just use the previously created one?
I'm running my server on an Amazon p2.xlarge EC2 instance. The OS info is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 35C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
python version: 3.5.2
pip version: 19.1.1
pip list output:
Output for the first request sent to server (numbers in the () are completion times in seconds):
(Cabin) ubuntu@ip-172-31-18-152:~/Cabin$ CUDA_VISIBLE_DEVICES=0 python Cabin.py --private_ip 172.31.18.152
Searching for gpus...
2019-06-23 11:17:35.611990: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-06-23 11:17:35.681815: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:17:35.682581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:17:35.685889: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:17:35.747229: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:17:35.778084: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:17:35.787495: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:17:35.856472: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:17:35.898971: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:17:36.013921: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:17:36.014076: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:17:36.014873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:17:36.015586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
Found all gpus. (0.40801453590393066)
Generating model...
Model ready. (0.0017066001892089844)
Bottle v0.12.16 server starting up (using WSGIRefServer())...
Listening on http://172.31.18.152:8080/
Hit Ctrl-C to quit.
Request arrived.
Downloading images...
Download complete. (0.23528265953063965)
Preparing images...
Images ready. (0.013093709945678711)
Saving images...
Images saved. (0.09435057640075684)
Evaluating model...
Preparing list...
List generated (0.00017762184143066406)
Loading images...
WARNING: Logging before flag parsing goes to stderr.
W0623 11:17:57.318472 140174189094656 deprecation_wrapper.py:119] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/utils/dataset_util.py:60: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
Images loaded (0.0007865428924560547)
Inside device
Predicting...
Predictions completed. (5.245208740234375e-06)
Calling zip function...
Zip() complete. (1.6689300537109375e-06)
Zipped: <zip object at 0x7f7c70280cc8>
Writing output masks...
W0623 11:17:57.343004 140174189094656 deprecation_wrapper.py:119] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/utils/preprocessing.py:232: The name tf.read_file is deprecated. Please use tf.io.read_file instead.
W0623 11:17:57.421230 140174189094656 deprecation.py:323] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/utils/preprocessing.py:234: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0623 11:17:57.440225 140174189094656 deprecation.py:323] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/utils/preprocessing.py:261: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
W0623 11:18:02.673879 140174189094656 deprecation_wrapper.py:119] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/deeplab_model.py:35: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0623 11:18:03.029822 140174189094656 deprecation_wrapper.py:119] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/deeplab_model.py:60: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.
W0623 11:18:03.216465 140174189094656 deprecation.py:323] From /home/ubuntu/Cabin/DeepLab/tensorflow_deeplab_v3_plus/deeplab_model.py:178: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
W0623 11:18:03.848108 140174189094656 deprecation.py:323] From /home/ubuntu/Cabin/Cabin/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1354: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2019-06-23 11:18:04.924563: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-23 11:18:04.998014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:04.998832: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ab3a30 executing computations on platform CUDA. Devices:
2019-06-23 11:18:04.998864: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-06-23 11:18:05.020871: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300055000 Hz
2019-06-23 11:18:05.021623: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7b7fda0 executing computations on platform Host. Devices:
2019-06-23 11:18:05.021653: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-06-23 11:18:05.021919: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:05.022751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:18:05.022824: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:18:05.022866: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:18:05.022889: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:18:05.022952: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:18:05.022989: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:18:05.023012: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:18:05.023040: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:18:05.023106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:05.023844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:05.024511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:18:05.025461: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:18:05.028172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:18:05.028201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-23 11:18:05.028214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-23 11:18:05.029583: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:05.030301: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:18:05.031000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
W0623 11:18:05.032312 140174189094656 deprecation.py:323] From /home/ubuntu/Cabin/Cabin/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-06-23 11:18:11.533404: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-06-23 11:18:13.546536: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
Preparing paths...
Paths ready. (2.0742416381835938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (1.1920928955078125e-06)
Prediction took: 21.077475786209106
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.0765237808227539)
Preparing paths...
Paths ready. (2.09808349609375e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (4.76837158203125e-06)
Prediction took: 0.457857608795166
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06001448631286621)
Collecting trashes...
All clear! (0.0003724098205566406)
Evaluation complete. (21.77125883102417)
Measuring...
Measuring complete. (1.4657764434814453)
78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Output for requests after the first one:
78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
Downloading images...
Download complete. (0.24880599975585938)
Preparing images...
Images ready. (0.00023603439331054688)
Saving images...
Images saved. (0.0910639762878418)
Evaluating model...
Preparing list...
List generated (0.00019860267639160156)
Loading images...
Images loaded (0.0002944469451904297)
Inside device
Predicting...
Predictions completed. (6.67572021484375e-06)
Calling zip function...
Zip() complete. (3.0994415283203125e-06)
Zipped: <zip object at 0x7f7c709369c8>
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0
I embeded the inference script inside my own script used to run the server and it is as below (here I donwload the images from a source for testing purposes and the script is not yet fully complete). It creates the GPU device at line 161 while entering the 'for pred_dict, image_path in zipped:' loop:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import argparse
import os
import glob
from io import BytesIO
import tensorflow as tf
import cv2
import DeepLab.tensorflow_deeplab_v3_plus.deeplab_model as deeplab_model
from DeepLab.tensorflow_deeplab_v3_plus.utils import preprocessing
from DeepLab.tensorflow_deeplab_v3_plus.utils import dataset_util
from PIL import Image
#import matplotlib.pyplot as plt
from tensorflow.python import debug as tf_debug
from bottle import run, post, request, route
import requests
import Cropper
import Measure
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='/home/ubuntu/Cabin/Data/',
help='The directory containing the image data.')
parser.add_argument('--output_dir', type=str, default='/home/ubuntu/Cabin/ModelOutput/',
help='Path to the directory to generate the inference results')
parser.add_argument('--infer_data_list', type=str, default='/home/ubuntu/Cabin/images_list.txt',
help='Path to the file listing the inferring images.')
parser.add_argument('--model_dir', type=str, default='/home/ubuntu/Cabin/DeepLab/model/',
help="Base directory for the model. "
"Make sure 'model_checkpoint_path' given in 'checkpoint' file matches "
"with checkpoint name.")
parser.add_argument('--base_architecture', type=str, default='resnet_v2_101',
choices=['resnet_v2_50', 'resnet_v2_101'],
help='The architecture of base Resnet building block.')
parser.add_argument('--output_stride', type=int, default=16,
choices=[8, 16],
help='Output stride for DeepLab v3. Currently 8 or 16 is supported.')
parser.add_argument('--debug', action='store_true',
help='Whether to use debugger to track down bad values during training.')
parser.add_argument('--private_ip', type=str, default='localhost',
help='The IP you want to run your server on.')
parser.add_argument('--port', type=int, default=8080,
help='The Port you want to run your server on.')
_NUM_CLASSES = 21
FLAGS, unparsed = parser.parse_known_args()
# This part sets all the needed directories
current_path = os.getcwd()
data_path = current_path + "/Data/"
output_path = current_path + "/Output/"
model_path = current_path + "/DeepLab/model/"
inference_path = current_path + "/DeepLab/tensorflow_deeplab_v3_plus/inference.py"
image_list_dir = current_path + "/images_list.txt"
model_output_path = current_path + "/ModelOutput/"
measure_path = current_path + "/Measure.py"
# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
pred_hooks = None
if FLAGS.debug:
debug_hook = tf_debug.LocalCLIDebugHook()
pred_hooks = [debug_hook]
print("Searching for gpus...")
start = time.time()
gpus = tf.config.experimental.list_physical_devices('GPU')
end = time.time()
print("Found all gpus. ("+ str(end-start) + ")")
print("Generating model...")
start = time.time()
model = tf.estimator.Estimator(
model_fn=deeplab_model.deeplabv3_plus_model_fn,
model_dir=FLAGS.model_dir,
params={
'output_stride': FLAGS.output_stride,
'batch_size': 1, # Batch size must be 1 because the images' size may differ
'base_architecture': FLAGS.base_architecture,
'pre_trained_model': None,
'batch_norm_decay': None,
'num_classes': _NUM_CLASSES,
})
end = time.time()
print("Model ready. ("+ str(end-start) + ")")
#print("Generating tensorflow session...")
#start = time.time()
#config = tf.ConfigProto()
#sess = tf.Session(config=config)
#end = time.time()
#print("Session created. ("+ str(end-start) + ")")
def evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path):
print("Preparing list...")
start = time.time()
# This part reads looks at the Data folder and writes the name of all files in there into sample_images_list.txt
imageList = open(image_list_dir, "w")
for file in os.listdir(data_path):
imageList.write(str(file)+"\n")
imageList.close()
end = time.time()
print("List generated ("+ str(end-start) + ")")
print("Loading images...")
start = time.time()
# This part runs the model for the current data
examples = dataset_util.read_examples_list(FLAGS.infer_data_list)
image_files = [os.path.join(FLAGS.data_dir, filename) for filename in examples]
end = time.time()
print("Images loaded ("+ str(end-start) + ")")
with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
print("Inside device")
print("Predicting...")
start = time.time()
predictions = model.predict(
input_fn=lambda: preprocessing.eval_input_fn(image_files),
hooks=pred_hooks)
end = time.time()
print("Predictions completed. ("+ str(end-start) + ")")
output_dir = FLAGS.output_dir
if not os.path.exists(output_dir):
os.makedirs(output_dir)
print("Calling zip function...")
start = time.time()
zipped = zip(predictions, image_files)
end = time.time()
print("Zip() complete. (" + str(end-start) + ")")
print("Zipped: " + str(zipped))
print("Writing output masks...")
predictionTimeStart = time.time()
for pred_dict, image_path in zipped:
# print("pred_dict is: " + str(pred_dict))
print("Preparing paths...")
start = time.time()
image_basename = os.path.splitext(os.path.basename(image_path))[0]
output_filename = image_basename + '_mask.png'
path_to_output = os.path.join(output_dir, output_filename)
end = time.time()
print("Paths ready. (" + str(end-start) + ")")
print("generating:", path_to_output)
start = time.time()
mask = pred_dict['decoded_labels']
end = time.time()
print("Generated. ("+ str(end-start) + ")")
# Use this part to also save mask
# tmp = Image.fromarray(mask)
# plt.axis('off')
# plt.imshow(tmp)
# plt.savefig(path_to_output, bbox_inches='tight')
predictionTimeEnd = time.time()
print("Prediction took: " + str(predictionTimeEnd - predictionTimeStart))
print("Cropping " + path_to_output)
start = time.time()
Cropper.evaluate(path_to_output, cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY))
end = time.time()
print("Cropped and wrote to file. ("+ str(end-start) + ")")
predictionTimeStart = time.time()
print("Collecting trashes...")
start = time.time()
for file in glob.glob(data_path + "*"):
os.remove(file)
end = time.time()
print("All clear! ("+ str(end-start) + ")")
@route('/')#@post('/')
def measure():
print("Request arrived.")
try:
# parse input data
# try:
# data = request.json()
# except:
# raise ValueError
#
# if data is None:
# raise ValueError
# extract and validate name
try:
id = "test"#data['id']
front_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['front_image_url']
side_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['side_image_url']
height = 173#data['height']
angle = 0#data['angle']
except (TypeError, KeyError):
raise ValueError
except KeyError:
# if name already exists, return 409 Conflict
response.status = 409
return
try:
print("Downloading images...")
start = time.time()
downloaded_front_image = requests.get(front_image_url)
downloaded_side_image = requests.get(side_image_url)
end = time.time()
print("Download complete. ("+ str(end-start) + ")")
except(FileNotFoundError, PermissionError, TimeoutError):
raise ValueError
print("Preparing images...")
start = time.time()
front_image = Image.open(BytesIO(downloaded_front_image.content))
side_image = Image.open(BytesIO(downloaded_side_image.content))
end = time.time()
print("Images ready. ("+ str(end-start) + ")")
print("Saving images...")
start = time.time()
front_image_name = data_path + str(id) + '_front.jpg'
side_image_name = data_path + str(id) + '_side.jpg'
front_image.save(front_image_name)
side_image.save(side_image_name)
end = time.time()
print("Images saved. ("+ str(end-start) + ")")
print("Evaluating model...")
modelstart = time.time()
evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path)
modelend = time.time()
print("Evaluation complete. ("+ str(modelend-modelstart) + ")")
print("Measuring...")
start = time.time()
Measure.evaluate(model_output_path + str(id) + "_front_mask_cropped.png", model_output_path + str(id) + "_side_mask_cropped.png", height, angle, id)
end = time.time()
print("Measuring complete. (" + str(end-start) + ")")
pass
run(host=FLAGS.private_ip, port=FLAGS.port)
I'm trying to run tensorflow-deeplab-v3 model on a server to segmentate images that I send. Everything works fine but the problem is every time I send an image the model looks for GPU and creates a new GPU device and this process of device creation costs around 10 seconds for each image that I send. How can I prevent the model from creating device every time and just use the previously created one?
I'm running my server on an Amazon p2.xlarge EC2 instance. The OS info is:
nvidia-smi output:
nvcc --version output:
python version: 3.5.2 pip version: 19.1.1 pip list output:
Output for the first request sent to server (numbers in the () are completion times in seconds):
Output for requests after the first one:
I embeded the inference script inside my own script used to run the server and it is as below (here I donwload the images from a source for testing purposes and the script is not yet fully complete). It creates the GPU device at line 161 while entering the 'for pred_dict, image_path in zipped:' loop: