Open WSTao opened 3 years ago
You just need to make sure the calib matrix is formatted correctly, and the parameters can vary from camera to camera. We have done verification on the nuscenes data set to prove that this is work. We used Model Zoo's DLA34 model (trained from only the kitti data set) to get the results without changing any parameters.
We format the calib of nuscnes as: P0: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 P1: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 P2: 1.252813102119e+03 0.000000000000e+00 8.265881147814e+02 0.000000000000e+00 0.000000000000e+00 1.252813102119e+03 4.699846626225e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 P3: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 R0_rect: 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 Tr_velo_to_cam: 1.122025939680e-02 -9.998986137987e-01 -8.767434198194e-03 -7.022340992421e-03 5.464515701519e-02 9.368031550067e-03 -9.984618905094e-01 -3.515059821513e-01 9.984427938514e-01 1.072390359095e-02 5.474472849433e-02 -7.332408994883e-01 Tr_imu_to_velo: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00
Thank you very much In other words, if you use different cameras or different installation locations(at this time, the internal and external parameters of the camera change), you need to remake the training data set?
YES. For now, you can only align Kitti's format.
Ok, thanks!
You just need to make sure the calib matrix is formatted correctly, and the parameters can vary from camera to camera. We have done verification on the nuscenes data set to prove that this is work. We used Model Zoo's DLA34 model (trained from only the kitti data set) to get the results without changing any parameters.
In KM3D , you reformulate the geometric constraints as a differentiable version and used for training . I wonder if KM3D is overfitted easily in train data 's camera parameters . But , It seems look work well in nuscenes data. Comparing to RTM3D’s ,I wonder if the generalization of KM3D is poor in other datasets, Do you make a comparison?
@cch2016 I have tried to run the pretrained model on my own camera images:
The cars can only be detected in close range (10-20m) because the image is cropped internally:
I have used the calibration data (projection matrix) from the KITTI dataset (calib/000000.txt).
@Banconxuan Did you use the calibration data (projection matrix) from KITTI or from NuScenes when doing inference on this image?
@walzimmer You should use projection matrix from your own dataset. It's generalization is pretty good.
Hi, @walzimmer did you successfully get the intended result on your custom dataset? I am currently working on my custom dataset, with cameras from higher angle. And, @cch2016 @Banconxuan do I need to corp my images into the same size as Kitti dataset images? The calib parameters that I need to change, are P2, R0_rect and Tr_velo_to_cam, and I should set other parameters to zero . Is that correct?
So , If I understand correctly all you need to change is P2. If you look at the run code for testing only P2 is being read
Hello, guys! I am also doing some work on the generalization of mono3d methods. And I wonder why the network can learn the robostness of camera intrinsics. The depth of instances will vary from different cameras and different datasets based on the camrea intrinsics. So I think it will fail on the depth estimation(other 3d box attributes may be good).
Brothers. I used some pictures in Suscenes for inferencing, and modified P2 (the intrinsic parameters of camera). The results show that the model can identify the object well, but the position seems to have a large deviation. Is it impossible to infer the positions of objects with different camera?
By looking at the code, I can see that the network directly outputs the location of each target. Don't the model need intrinsic parameters to get the position of object from a picture? In this way, how can model obtain position of object from pictures taken by different cameras?
If I use camera with different focal length,can the model infer accurate position of object?
If you know, I hope you will give me some advice. Thank you very much!
Could you please offer some details about how to train KM3D on Nuscenes dataset to obtain the result in the paper(AP=15.3)? Thank you.
Monocular 3D depends on camera parameters. If you change a different camera or installation method, the original DataSet training model will not work. So how can you solve this difference