akashsengupta1997 / HuManiFlow

[CVPR 2023] Code repository for HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
MIT License
74 stars 2 forks source link

Using model with multiple images or video #1

Closed personal-coding closed 1 year ago

personal-coding commented 1 year ago

What is the best way to update the code to work on multiple images or a video? I attempted to use VideoCapture on a gif file to read each frame. However, I am having difficulty appending each image and heatmap together to be fed into the model.

This is in the predict_humaniflow.py script:

for image_fname in tqdm(sorted([f for f in os.listdir(image_dir)])):
        with torch.no_grad():
            # Capture video from file
            cap = cv2.VideoCapture(os.path.join(image_dir, image_fname))
            # Capture frame-by-frame
            ret, frame = cap.read()
            frames = []
            while ret:
                # ------------------------- INPUT LOADING AND PROXY REPRESENTATION GENERATION -------------------------
                image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

                ...............

                frames.append(torch.cat([proxy_rep_img, proxy_rep_heatmaps], dim=1))
                ret, frame = cap.read()
                if not ret:
                    break

            cap.release()
            cv2.destroyAllWindows()
            proxy_rep_input = torch.cat([x.float() for x in frames], dim=1).float()  # (1, 18, img_wh, img_wh)
akashsengupta1997 commented 1 year ago

The easiest way is to pass each image as inputs sequentially (as is done in predict_humaniflow.py anyway) - all you have to do is change the for loop here to be over the video/gif frames.

If you want to predict in parallel, which I think is what you have attempted in your code, you need to concatenate the proxy representations in the batch dimension (dim=0), not dim=1. Moreover, you would need change visualisation code to work in a batched manner. This would probably be a hassle, so I would recommend going for the sequential method, unless speed is a big concern.