brade31919 / radar_depth

Source code of the IROS 2020 paper "Depth Estimation from Monocular Images and Sparse Radar Data"
MIT License
87 stars 13 forks source link

Questions about hyperparameters and processed dataset #5

Closed pjckoch closed 3 years ago

pjckoch commented 3 years ago

Hi @brade31919 ,

first of all, thanks for sharing your code again! I have a few questions about your code:

1. Learning rate and batch size:

I noticed that you wrote in your paper:

Unless stated otherwise, all the models are trained using a batch size of 16 and the SGD optimizer with a learning rate of 0.001 and a momentum of 0.9 for 20 epochs.

However, in your code, the default learning rate is 0.01: https://github.com/brade31919/radar_depth/blob/6dc235433510eb7e24195a55565ea9d906e7c0d3/utils.py#L49

And the batch size indicated in the shell script is 8 https://github.com/brade31919/radar_depth/blob/5e6e75772ff379aac65379a50d4042a7c64c869d/train_model.sh#L9

Are these the hyperparameters used to achieve the results from your paper? Or did you train your recent models with different parameters?

2. Weight decay:

I think you did not mention weight decay in your paper. Did you train the model in your paper with weight decay of 1e-4 as indicated in your code? https://github.com/brade31919/radar_depth/blob/6dc235433510eb7e24195a55565ea9d906e7c0d3/utils.py#L53

3. Processed dataset:

Unfortunately, I cannot download your processed dataset at the moment. I know you plan to add some documentation for that, but perhaps you could answer this question beforehand: Did you make any alteration to the actual data? Or did you merely change the structure of the dataset?

4. Transform points:

What is the usage of the following function? https://github.com/brade31919/radar_depth/blob/6dc235433510eb7e24195a55565ea9d906e7c0d3/dataset/nuscenes_dataset_torch_new.py#L587 Is this incorporated in the generation of your processed dataset? If yes, what is the intention behind it?

Thanks a lot in advance!

Best, Patrick

brade31919 commented 3 years ago

Hi @pjckoch,

  1. Regarding the learning rate issue, I think the number on the paper is wrong (sorry for that). I didn't modify the default argument for learning rate since the submission, so I think it's 0.01. Regarding the batch size, as I mentioned in the README.md, I used batch size = 8 to train the current released model because the original checkpoint was deleted (trained by batch size 16). I don't want to list all the detailed reasons because I think the users might not even be interested in them. The reason why I used batch size 8 is that I can't use the cluster I used during the paper submission anymore (so no V100). On the machine I can access currently, batch size = 8 is the biggest one I can use, and it reached the similar performance.

  2. Yeah I used weight decay during the training and I didn't mention it in my paper. I don't remember whether the reason is the page limit or that I simply forgot it, but I think it's hard for one to include all the details in the paper. I didn't spend lots of time tuning the hyper-parameters, and I believe you can find something through tuning them, but that was not the main objective of my project at that time.

  3. I tried the command I provided just now and it worked. Can you specify the situation you encountered? It's hard to explain. I didn't merely change the data structure. The depth maps are not provided in the nuScenes dataset (both LiDAR and Radar), so I did the projection, discarded some unrelated info, and saved them. And the processed dataset did make the code release easier because I don't have the storage to verify all the code on the raw nuScenes dataset.

  4. That's just some old code fragments that I failed to remove thoroughly during the release. I tried multi-task model before. In that setting, we'll have (1) RGB images, (2) projected Radar depth maps, and (3) Radar points in vector format. We did depth estimation and point cloud classification simultaneously to see whether we can remove the noisy measurements and improve the depth estimation.

Sincerely, Juan-Ting Lin

pjckoch commented 3 years ago

Hi @brade31919 ,

thanks a lot for getting back to me so quickly.

I understand that space for a paper publication is limited and not every detail can be included. Thanks for all the clarifications!

Regarding 3, the problem is simply that I have the regular Nuscenes dataset already downloaded and currently don't have enough space left to download yours. Sorry for the misunderstanding, I didn't want to imply that there was something wrong with your command.

Best, Patrick

pjckoch commented 3 years ago

One more question: are your pretrained models trained with sensor samples from all directions (i.e. front, front_right, back_right, back, back_left and front_left)? I've trained your model myself and results look a tad blurrier, but perhaps that's because I only loaded front and back view.

brade31919 commented 3 years ago

No, I only used the front and back view. There are not many Radar points on other directions. What do you mean by a tad blurrier?

  1. Can you show the metric results? Like RMSE, MAE, Delta1, etc.
  2. Did you use the processed data?
isht7 commented 3 years ago

@brade31919 thanks for replying to the queries above. I had a question - in the .h5 files in the folder ver2_lidar1_radar3_radar_only, what is the unit of lidar_depth maps? The values in these maps are very large and I see that you divided them by 256. during data processing here.

brade31919 commented 3 years ago

Hi @isht7 The projected depth values in the original depth maps are float32 and the unit is meter (m). However, we don't want to save float32 because it took too much storage. A common technique is that we can convert it to uint16 by int(depth * 256). This keeps certain degree of accuracy but takes less storage. That's why we need to divide the value by 256. after reading the depth map from h5 files. I remember it's also used in kitti dataset?

isht7 commented 3 years ago

Thank you very much @brade31919 for the prompt reply. I noted that you set a numpy seed to split into train and val. As noted here, the behavior of np.random.choice may change over different python / numpy versions. Could you share the train / val scenes used by you?

This script which I borrowed from your code should find the splits. If you could share the variables train_scenes and val_scenes, that would be great! Instead of this we could also check that the last line prints the same value on both your and my computer. On my computer, the output of

print (np.sum(train_scenes), np.sum(val_scenes))

is

322700 38125

Could you please check and tell if you also get the same output for this print statement? If the outputs are not same, it would be great if you could share the variables train_scenes and val_scenes in the code snippet.