Open RSKothari opened 4 years ago
Hi there,
This might be a duplicate of Duplicate of #282 I haven't used DensePose myself, but I suppose the usage should be the same as described in the issue
You just need to call model with a batch of inputs
Also there is #1986 which explains how to sort images as well before doing inference. You should be able to work it out with this info, let us know if there is anything unclear.
@MathijsNL Thanks, my question however is specific to the DensePose module within Detectron2. It seems it reads in one image after the other to perform inference.
yes, currently DensePose doesn't provide an efficient reader that would batch video inputs. I've got a pending PR to torchvision
that addresses this issue.
For batched input inference, you can make the following change to apply_net.py(InferenceAction class):-
@classmethod
def execute(cls: type, args: argparse.Namespace):
batch_size = 16
logger.info(f"Loading config from {args.cfg}")
opts = []
cfg = cls.setup_config(args.cfg, args.model, args, opts)
logger.info(f"Loading model from {args.model}")
predictor = DefaultPredictor(cfg)
logger.info(f"Loading data from {args.input}")
file_list = cls._get_input_file_list(args.input)
if len(file_list) == 0:
logger.warning(f"No input images for {args.input}")
return
context = cls.create_context(args, cfg)
for file_batch_ind in range(math.ceil(len(file_list)/batch_size)):
img_list = []
for batch_ind in range(batch_size):
if((batch_size*file_batch_ind + batch_ind) >= len(file_list)):
break
img = read_image(file_list[batch_size*file_batch_ind + batch_ind], format="BGR") # predictor expects BGR image.
img_list.append(img)
with torch.no_grad():
outputs = predictor(img_list)
for batch_ind in range(batch_size):
if((batch_size*file_batch_ind + batch_ind) >= len(file_list)):
break
cls.execute_on_outputs(context, {"file_name": file_list[batch_size*file_batch_ind + batch_ind], "image": img_list[batch_ind]},
outputs[batch_ind]["instances"])
cls.postexecute(context)
You would also need to change the call function of the DefaultPredictor class in detectron2/engine/defaults.py
def __call__(self, original_image_list):
"""
Args:
original_image (np.ndarray): an image of shape (H, W, C) (in BGR order).
Returns:
predictions (dict):
the output of the model for one image only.
See :doc:`/tutorials/models` for details about the format.
"""
with torch.no_grad(): # https://github.com/sphinx-doc/sphinx/issues/4258
# Apply pre-processing to image.
if self.input_format == "RGB":
# whether the model expects BGR inputs or RGB
original_image = original_image[:, :, ::-1]
inputList = []
for original_image in original_image_list:
height, width = original_image.shape[:2]
image = self.aug.get_transform(original_image).apply_image(original_image)
image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
image.to(self.cfg.MODEL.DEVICE)
inputs = {"image": image, "height": height, "width": width}
inputList.append(inputs)
predictions = self.model(inputList)
return predictions
I have modified the predictor function to take in a list of images and dump the results in the same format as the sequential image processing.
❓ How to do something using detectron2
Currently, DensePose reads in single images and infer dense annotations. This is very slow and quite wasteful. Does DensePose have the ability to read in batches of images to perform inference?
Describe what you want to do, including:
what inputs you will provide, if any: A video filled with images
what outputs you are expecting: A pickle file with dense pose annotations, except inferred a lot faster.
❓ What does an API do and how to use it?
Please link to which API or documentation you're asking about from https://detectron2.readthedocs.io/
NOTE:
Only general answers are provided. If you want to ask about "why X did not work", please use the Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to detectron2, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.