number of dimensions in the tensor input does not match the length of the desired ordering of dimensions

TikZSZ commented 1 year ago

Hey, I have been trying to run nvdiffrec on my dataset and I get an error so for demonstration I have used another dataset (a video) using this colmap2poses script # I'm using this command to do so

python .\colmap2poses.py  --video_in=razor.mp4 --video_fps=10 --mask --colmap_path=Colmap/COLMAP.bat  images

I have added --mask flag to automate masking and script is generating masks through rembg. I have uploaded output of the script here

This dataset has way less images in view_images.txt but i got same error on a dataset with 45 images so I think that isn't the issue #

The issue is when I copy all the files inside data/nerd/razor and run
python scale_images.py

The normal images are scaled but when the script starts to work on masks folder I end up getting this error

razor\images\0134.jpg
razor\images\0135.jpg
razor\images\0136.jpg
razor\images\0137.jpg
razor\images\0138.jpg
razor\masks\0001.jpg
Traceback (most recent call last):
  File "C:\Users\repac\Desktop\nvdiffrec\custom data\nvdiffrec\data\nerd\scale_images.py", line 34, in <module>
    img = img[None, ...].permute(0, 3, 1, 2)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 3 is not equal to len(dims) = 4
(dmodel) PS C:\Users\repac\Desktop\nvdiffrec\custom data\nvdiffrec\data\nerd>

Also I have noticed the contents of images folder are printed twice in console before it starts working on masks folder

I have been able to run normal spot and bob examples on 4090. Sorry for long post, thanks

IvanGarcia7 commented 1 year ago

Any solution??

jmunkberg commented 1 year ago

Hello @IvanGarcia7 ,

Sorry, but we cannot support colmap2poses.py , which is not part of our code base.

Our script scale_images.py is designed particularly for the NeRD dataset as described here https://github.com/NVlabs/nvdiffrec/tree/main/data/nerd so if you want to use it for other datasets, you may have to adapt it.

Looking at the error above, it seems to be a problem with dimensions on the image tensor. We call torch.nn.functional.interpolate with 4D tensors https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html so please reshape accordingly

TikZSZ commented 1 year ago

Ok so I found the issue, it was caused by masked images being only 2d (height and width). I got it to work somewhat by adding a color channel of black and white. It seemed to have somewhat worked but the masks that were originally generated are high quality and i would like to preserve that somehow.

This is the script that im using to add 3rd dimension to the image.

def convert_masks_to_3d_and_save(input_folder, output_folder):
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Get a list of all the 2D mask images in the input folder
    mask_files = [f for f in os.listdir(input_folder) if f.endswith(".jpg")]

    # Loop through each 2D mask image
    for mask_file in mask_files:
        # Load the 2D mask image as a numpy array
        mask = cv2.imread(os.path.join(input_folder, mask_file), cv2.IMREAD_GRAYSCALE)

        # Convert the 2D mask to a 3D mask with a grayscale channel for the foreground
        mask_3d = np.zeros((mask.shape[0], mask.shape[1], 3), dtype=np.uint8)
        mask_3d[mask == 0] = [0, 0, 0]
        mask_3d[mask > 0] = [255, 255, 255]

        # Save the 3D mask to the output folder
        cv2.imwrite(os.path.join(output_folder, mask_file), mask_3d)

I also tried unsqueeze(0) and squeeze(0) to add another dimension but that gave some RBG color channel error

Traceback (most recent call last):
  File "C:\Users\repac\Desktop\nvdiffrec\custom data\nvdiffrec\data\nerd\scale_images.py", line 41, in <module>
    imageio.imwrite(out_file, np.clip(np.rint(rescaled_img.numpy() * 255.0), 0, 255).astype(np.uint8))
  File "D:\anaconda3\envs\dmodel\lib\site-packages\imageio\v2.py", line 259, in imwrite
    raise ValueError("Image must be 2D (grayscale, RGB, or RGBA).")
ValueError: Image must be 2D (grayscale, RGB, or RGBA).

This is what I had added to `scale_images.py`

if(folder == "masks"):
        img = img.unsqueeze(0)
        img = img[None, ...].permute(0, 3, 1, 2)
        rescaled_img = torch.nn.functional.interpolate(img, res, mode='area')
        rescaled_img = rescaled_img.permute(0, 2, 3, 1)[0, ...]
        out_file = os.path.join(dataset_rescaled, folder, os.path.basename(file))
        rescaled_img = rescaled_img.squeeze(0)
        imageio.imwrite(out_file, np.clip(np.rint(rescaled_img.numpy() * 255.0), 0, 255).astype(np.uint8))

Any help would be appreciated regarding either getting this 2d original mask accepted by scale_images.py or to accurately add color channel data cause by lack of precession

TikZSZ commented 1 year ago

For anyone else that stumbles upon the above issue try this it restores the precession of mask to that of original.

def convert_masks_to_3d_and_save(input_folder, output_folder):
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Get a list of all the 2D mask images in the input folder
    mask_files = [f for f in os.listdir(input_folder) if f.endswith(".jpg")]

    # Loop through each 2D mask image
    for mask_file in mask_files:
        # Load the 2D mask image as a numpy array
        mask = cv2.imread(os.path.join(input_folder, mask_file), cv2.IMREAD_GRAYSCALE)

        # Convert the 2D mask to a 3D mask with a grayscale channel for the foreground
        mask_3d = np.zeros((mask.shape[0], mask.shape[1], 3), dtype=np.uint8)
        mask_3d[:,:,0] = mask
        mask_3d[:,:,1] = mask
        mask_3d[:,:,2] = mask

        # Save the 3D mask to the output folder
        cv2.imwrite(os.path.join(output_folder, mask_file), mask_3d)

The result im getting is not great but my dataset might be the problem but for now this seem to work well. Thanks to chatgpt for help and to quote it here

Yes, that is correct. Converting a 2D grayscale image to a 3D image with three identical channels can be achieved by simply duplicating the same intensity values across all three channels. The result is a 3D image where each pixel is represented by a triplet of identical values, representing the same grayscale intensity as in the original 2D image.

This can be useful, for example, if the image needs to be processed using a library or algorithm that requires a 3D image format, even though the image itself does not have any color information.

luojin commented 1 year ago

@TikZSZ you have 92 images with shape [512, 910], but colmap only recognise 2 of them, which display in view_imgs.txt file. so if you put all 92 images to training, it will results error like :

assert len(all_img) == self.imvs.shape[0] and len(all_mask) == self.imvs.shape[0]
AssertionError

how do you solve this error?

TikZSZ commented 1 year ago

@luojin U have to delete the images that do not exist in view_images.txt

luojin commented 1 year ago

@TikZSZ but only 2 images can reconstruct the 3d model currectly? can you please show me your nerd_*.json ?

TikZSZ commented 1 year ago

@luojin I have since deleted the files but yes you are right 2 images won't work, this was an example to show the orignal error of the post which occured on all dataset, we used some meuseme data sets later that much more images but results were lackluster

luojin commented 1 year ago

@TikZSZ yes, thanks for your reply. i could not get the desirable result with my custom dataset neither. if you have more information about that, please tell me.

NVlabs / nvdiffrec