JiaxiongQ / DeepLiDAR

Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image (CVPR 2019)
MIT License
248 stars 49 forks source link

How to prepare the data for `trainN.py`? #22

Open AnaRhisT94 opened 4 years ago

AnaRhisT94 commented 4 years ago

Hi, thanks for this amazing repo. @JiaxiongQ

I'm trying to get trainN.py and nomalLoader.py to work in order to train the first NN. This is what I understood so far that I need in order to train:

  1. Download data_depth_velodyne which is the sparse Lidar dataset.
  2. Download data_depth_annotated which is the ground-truth (Dense) Lidar dataset.
  3. Use the second repo. in order to generate from the ground-truth Dense Lidar dataset the ground-truth normals.
  4. Download ALL the RGB Kitti images from all the categories ( City | Residential | Road | Campus | Person | Calibration ), Is there a link to download all at once instead of downloading one by one?

Question 1: Do I need to extract all the RGB Images into the folders one by one into data_depth_velodyne/train/..*sync/ - I need to add image_02 and image_03 folders to each of the sync folders? (This is implied from your code)

Question 2: Is there a way to download all the RGB Images in one-shot instead of clicking one by one and extracting them one by one to all the folders?

In nomalLoader.py the function dataloader(filepath) returns 3 variables: left_train,normalS_train,normal_gts which are: a. left_train - the RGB Kitty Image folders 'data_depth_velodyne/train/..*sync/image02 & 03/data. b. normalS_train - - the Sparse lidar folders 'data_depth_velodyne/train/..*sync/proj_depth/velodyne_raw/image02 & 03/. c. normal_gts is the folder which has all the normals I generated from dense gt: data_depth_annotated/*_sync/proj_depth/groundtruth/image_02 & image_03 -> gt/out/train/*_sync/image_02 & image_03 or should it be all in gt/out/train/*_sync/? Because in the code there isn't anything about concatinating the image_02 & image_03.

Question 3: please look at c., I asked there about the ground-truth normals.

Question 4: When and where the synthetic data is used? Do we use it also in trainN.py? Do we use it in all the 3 NNs?

Question 5: How many epochs is recommended to train on? Other than that, thank you. It took me so many hours just to get to the point I understand how to get the data ready (and still trying), I'll definitely add a guide on how to prepare the data to train after this post, so others can save many hours to understand the process.

JiaxiongQ commented 4 years ago

1: Yes, we used images from left camera and right camera.

  1. Sorry, I don't know where this link exists.
  2. I think it doesn't matter, you could just confirm that all the file names from different folders can be matched.
  3. We just trained our surface normal model on the synthetic data firstly and then finetuned it on the KITTI to get better surface normal.
  4. All 3 NNs are trained on 15 epochs, but the last one used lower learning rate. Thanks for your good questions! I think they also can help others.
AnaRhisT94 commented 4 years ago

I see, thank you for the answers! @JiaxiongQ I'll update later with my progress and write a full step-by-step on how to do it for people who are confused in the beginning like me.

AnaRhisT94 commented 4 years ago

@JiaxiongQ
For training with Synthetic data: I use RGBRight and RGBLeft folders, Sparse Lidar dataset in the folder lidar And finally, the ground-truth normals from dense depth lidar I take from the folder Normal_m right? Question 1: Is that true that the folders above are used for training?

Question 2: If yes, there's only sparse lidar for RGBLeft and Normals_m for RGBLeft, why do we use RGBRight?

JiaxiongQ commented 4 years ago

Yes, because we only generated surface normal from the depth of left camera.

AnaRhisT94 commented 4 years ago

Yes, because we only generated surface normal from the depth of left camera.

Thank you!

AnaRhisT94 commented 4 years ago

Hi @JiaxiongQ , After I prepared the 3 folders: RGBLeft, lidar and Normals_m from /Town11/SEQ (to test that the training work), I'm getting the following error:

  File "/home/unknown/depth_est/DeepLiDAR/submodels/depthCompleNew.py", line 155, in forward
    inputS = torch.cat((sparse,mask),1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

sparse.shape (4, 1, 256, 512) mask.shape (4, 256, 512, 1)

I probably need to change the 1 in mask to be after 4 and it will be fixed, I'll try that out and update. But why doesn't it work out of the box? I didn't see any posts about this when training the first NN, did I do something wrong in the process?

EDIT: When changing the shape with np.transpose so that mask will have (4,1,256,512), it gives me a new errors, also other errors happen if I change sprarse instead.. any ideas how to solve this? I'm out of ideas, also didn't see anyone here saying they got this error when training. I double and triple checked my paths and the images and len of images (495 images) is for every of the 3 folders, so the data itself should be fine.

JiaxiongQ commented 4 years ago

In our 'dataloader/trainLoaderN.py', there is: Screenshot from 2020-01-07 09:22:30 So you should not need to do 'np.transpose'. And sparse did the same operation, their shapes should be matched.

AnaRhisT94 commented 4 years ago

I see, but that still doesn't work, I attached an image of the variables before exiting __getitem__ function in trainLoaderN.py: vars Same error: (Haven't changed anything in the code except loading the images in nomalLoader.py:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

Also, searching for a solution for this problem, they suggested to make batch_size = 1, which didn't help in:

TrainImgLoader = torch.utils.data.DataLoader(
        DA.myImageFloder(all_left_img,all_normal,all_gts ,True, args.model),
        batch_size = 1, shuffle=True, num_workers=1, drop_last=True)

Also in trainN.py: I printed the shapes before loss=train(...)

        for batch_idx, (imgL_crop,sparse_n,mask,mask1,data_in1) in enumerate(TrainImgLoader):
            start_time = time.time()
            print(imgL_crop.shape)
            print(sparse_n.shape)
            print(mask1.shape)
            print(mask1.shape)
            print(data_in1.shape)

Output:

(1, 3, 256, 512)
(1, 1, 256, 512)
(1, 256, 512, 3)
(1, 256, 512, 3)
(1, 256, 512, 3)

I really want to get this to work and I've no idea why this doesn't..

JiaxiongQ commented 4 years ago

Sorry, I don't know why this would happen, but you can use ‘torch.permute()’ to change the dimension and make all the dimension of inputs like (b, c, 256, 512).

AnaRhisT94 commented 4 years ago

Sorry, I don't know why this would happen, but you can use ‘torch.permute()’ to change the dimension and make all the dimension of inputs like (b, c, 256, 512).

Hi @JiaxiongQ , I'll try with torch.permute()’ soon, other than that, I'm out of ideas, any chances you could help me out with this any further? Here's the code to prepare the 3 folders from Town11/SEQ0:

def dataloader_synthetic(filepath):
    imagesl = []
    normalS = []
    normal_gts = []
    temp = filepath

    filepathl = temp + 'Town11/SEQ0' #RGB dataset folder, Left and Right
    filepathgt = filepathl + '/Normal_m'
    #seqs = [seq for seq in os.listdir(filepathl) if seq.find('sync') > -1]
    left_fold = '/RGBLeft'
    right_fold = '/RGBright'
    lidar_foldl ='/lidar'
    #lidar_foldr = '/proj_depth/velodyne_raw/image_03'

    #for seq in seqs:
    left_path = filepathl + left_fold
    right_path= filepathl + right_fold
    lc= [os.path.join(left_path, img) for img in os.listdir(left_path)]
    lc.sort()

    #lc=lc[5:-5]
    rc= [os.path.join(right_path, img) for img in os.listdir(right_path)]
    rc.sort()
    #rc=rc[5:-5]
    imagesl = np.append(imagesl, lc)
    #imagesl = np.append(imagesl, rc)

    gt_path = filepathgt
    lids2l = filepathl
    lidar2l = [os.path.join(lids2l + lidar_foldl,lid) for lid in os.listdir(lids2l + lidar_foldl)]
    lidar2l.sort()
    normalS = np.append(normalS, lidar2l)
    #lids2r = os.path.join(filepathl, seq) + lidar_foldr
    #lidar2r = [os.path.join(lids2r, lid) for lid in os.listdir(temp)]
    #lidar2r.sort()
    #normalS = np.append(normalS, lidar2r)

    gt_imgs = [os.path.join(gt_path, norm) for norm in os.listdir(gt_path)]
    gt_imgs.sort()
    normal_gts= np.append(normal_gts, gt_imgs)
    #normal_gts= np.append(normal_gts, gt_imgs)

    left_train = imagesl
    normalS_train = normalS
    return left_train,normalS_train,normal_gts

Didn't change anything else.

After using torch.permute(), it doesn't shoot this error now, but there's a new error in the function: nomal_loss: (there's a torch inside that tuple), so I guess it needs to be converted to torch, or not permuted at all, I'm not sure why I'm getting all these errors and no one else posted any of these errors here.

    pred_n = pred.permute(0,2,3,1)
AttributeError: 'tuple' object has no attribute 'permute'

Full code of that function:

def nomal_loss(pred, targetN,mask1):
    valid_mask = (mask1 > 0.0).detach()
    print(type(pred))
    print(pred)
    pred_n = pred.permute(0,2,3,1)
    pred_n = pred_n[valid_mask]
    target_n = targetN[valid_mask]

    pred_n = pred_n.contiguous().view(-1,3)
    pred_n = F.normalize(pred_n)
    target_n = target_n.contiguous().view(-1, 3)

    loss_function = nn.CosineEmbeddingLoss()
    loss = loss_function(pred_n, target_n, Variable(torch.Tensor(pred_n.size(0)).cuda().fill_(1.0)))
    return loss

Now changed to pred_n = pred[0]

and new error:

    pred_n = pred_n[valid_mask]
IndexError: The shape of the mask [1, 3, 256, 512] at index 1does not match the shape of the indexed tensor [1, 2, 256, 512] at index 1
JiaxiongQ commented 4 years ago

This code is mainly for KITTI,you should modify it and just insure the file names can be matched.

AnaRhisT94 commented 4 years ago

This code is mainly for KITTI,you should modify it and just insure the file names can be matched.

Hi @JiaxiongQ , Yes, I did, I modified it to work with the 3 folders with the synthetic, and it still doesn't work. (you can see most of it is commented out, and i renamed the function name)

valgur commented 4 years ago

Regarding Q2, the raw KITTI data overview page provides a raw_data_downloader.sh script to download and extract all of the raw data zip files. A slightly modified version with cleaner status info output can be found here: https://gist.github.com/valgur/cb9da4d1370ccc13c7c6b7c8c632d3e2

graycrown commented 4 years ago

Hi @JiaxiongQ , After I prepared the 3 folders: RGBLeft, lidar and Normals_m from /Town11/SEQ (to test that the training work), I'm getting the following error:

  File "/home/unknown/depth_est/DeepLiDAR/submodels/depthCompleNew.py", line 155, in forward
    inputS = torch.cat((sparse,mask),1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

sparse.shape (4, 1, 256, 512) mask.shape (4, 256, 512, 1)

I probably need to change the 1 in mask to be after 4 and it will be fixed, I'll try that out and update. But why doesn't it work out of the box? I didn't see any posts about this when training the first NN, did I do something wrong in the process?

EDIT: When changing the shape with np.transpose so that mask will have (4,1,256,512), it gives me a new errors, also other errors happen if I change sprarse instead.. any ideas how to solve this? I'm out of ideas, also didn't see anyone here saying they got this error when training. I double and triple checked my paths and the images and len of images (495 images) is for every of the 3 folders, so the data itself should be fine.

I meet the same problem about dimension mismatch, I have change a lot to fit it on sythtic dataset, You could debug the program step by step to change the dimension order to fix it. the recommend dimension order of PyTorch is (B, C, H, W). I have changed the order like this : inputl = inputl.cuda()# .permute(0,2,3,1) sparse = sparse.cuda()# permute(0,2,3,1) gt1 = gt1.cuda().permute(0,3,1,2) mask1 = mask1.cuda().permute(0,3,1,2) mask = mask.cuda().permute(0,3,1,2)

May it help you

JiaxiongQ commented 4 years ago

Q1:Yes, we use images from the left camera and the right camera; Q2:We don't find this link, we just download the dataset one by one; Q3:It is flexible to organize files, you just need to make sure that all the images are corresponding; Q4:No, you just need to use the one DCU to train surface normal. The synthetic data is used to improve the quality of surface normal, the download link is in README.md; Q5:The whole training process takes 15 epochs on 3 GPUs(1080 ti).

On Thu, Apr 9, 2020 at 9:24 AM graycrown notifications@github.com wrote:

Hi, thanks for this amazing repo. @JiaxiongQ https://github.com/JiaxiongQ

I'm trying to get trainN.py and nomalLoader.py to work in order to train the first NN. This is what I understood so far that I need in order to train:

  1. Download data_depth_velodyne which is the sparse Lidar dataset.
  2. Download data_depth_annotated which is the ground-truth (Dense) Lidar dataset.
  3. Use the second repo. in order to generate from the ground-truth Dense Lidar dataset the ground-truth normals.
  4. Download ALL the RGB Kitti images from all the categories ( City | Residential | Road | Campus | Person | Calibration ), Is there a link to download all at once instead of downloading one by one?

Question 1: Do I need to extract all the RGB Images into the folders one by one into data_depth_velodyne/train/..*sync/ - I need to add image_02 and image_03 folders to each of the sync folders? (This is implied from your code)

Question 2: Is there a way to download all the RGB Images in one-shot instead of clicking one by one and extracting them one by one to all the folders?

In nomalLoader.py the function dataloader(filepath) returns 3 variables: left_train,normalS_train,normal_gts which are: a. left_train - the RGB Kitty Image folders 'data_depth_velodyne/train/..sync/image02 & 03/data. b. normalS_train - - the Sparse lidar folders 'data_depth_velodyne/train/..sync/proj_depth/velodyne_raw/image02 & 03/. c. normal_gts is the folder which has all the normals I generated from dense gt: data_depth_annotated/_sync/proj_depth/groundtruth/image_02 & image_03 -> gt/out/train/_sync/image_02 & image_03 or should it be all in gt/out/train/*_sync/? Because in the code there isn't anything about concatinating the image_02 & image_03.

Question 3: please look at c., I asked there about the ground-truth normals.

Question 4: When and where the synthetic data is used? Do we use it also in trainN.py? Do we use it in all the 3 NNs?

Question 5: How many epochs is recommended to train on? Other than that, thank you. It took me so many hours just to get to the point I understand how to get the data ready (and still trying), I'll definitely add a guide on how to prepare the data to train after this post, so others can save many hours to understand the process.

About Q2, You can find it in tool sets in KITTI homepage, someone support a script to download them all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JiaxiongQ/DeepLiDAR/issues/22#issuecomment-611276326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJANJRATRINBRO3ODJPHKOTRLUPWHANCNFSM4KCWAYVA .