EnVision-Research / Generalizable-BEV

146 stars 19 forks source link

Reporducibilty of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. (https://arxiv.org/pdf/2303.01686.pdf) #3

Closed riteshkhrn closed 3 months ago

riteshkhrn commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554
this way it will be reflected in the underlying pipeline also without modifying for it explicitly.

Is this your observations as well? or if you can point me in right direction?

LuPaoPao commented 5 months ago

Thank you for your attention. I really appreciate how careful you are.

I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction.

Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change.

In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time.

Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

LuPaoPao commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly.

Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are.

I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction.

Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change.

In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time.

Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

riteshkhrn commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are.

I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction.

Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change.

In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time.

Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

LuPaoPao commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

riteshkhrn commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

can you also give a bit more detail about the effective strategy to use DP-DG as you mentioned in the above comment?

LuPaoPao commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

can you also give a bit more detail about the effective strategy to use DP-DG as you mentioned in the above comment?

I highly recommend you take a look at his supplement.

riteshkhrn commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

can you also give a bit more detail about the effective strategy to use DP-DG as you mentioned in the above comment?

I highly recommend you take a look at his supplement.

I tried to find it online but unfortunately could not, can you share the link for the supplement

LuPaoPao commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

can you also give a bit more detail about the effective strategy to use DP-DG as you mentioned in the above comment?

I highly recommend you take a look at his supplement.

I tried to find it online but unfortunately could not, can you share the link for the supplement

can you give me your email, or draw me a email hlu585@connect.hkust-gz.edu.cn, I will sent you a pdf.

riteshkhrn commented 5 months ago

In your paper you reference the methodologies of Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View. I see in your code that you are trying to to Dynamic Perspective(DP) - DG in https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L1952 When I tried to reproduce this DP-DG I do not see any improvement rather the results are much worse. Any thought about this? My baseline is BEVDepth. To reproduce this configuration i mostly used the augmentations mentioned here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2306 and homography from here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2540 the key change in my code is that i directly assign the sensor2keyego = sensor2keyego_aug as is done here https://github.com/EnVision-Research/Generalizable-BEV/blame/d26553485fa315cb8ef383827f34a78edea755dd/mmdet3d/datasets/pipelines/loading_UDA.py#L2554 this way it will be reflected in the underlying pipeline also without modifying for it explicitly. Is this your observations as well? or if you can point me in right direction?

Thank you for your attention. I really appreciate how careful you are. I also tried to have a Dynamic Perspective in DG-BEV, and it really didn't improve significantly. The ablation experiments on DG-BEV show that the effect of this method is not obvious. The core of DG-BEV seems to be intrinsic decoupling depth prediction. Dynamic Perspective in DG-BEV, the parameters I am considering changing are the augmentation method of the original image and the external parameters of the LSS projection process. I tried a lot of things. There is an effective strategy, which is not necessarily effective, but at least does not degrade performance. Only the augmentation of the image is changed, but the 2D center position of the object is not changed after the augmentation. At the same time, the camera parameters do not change. In short, the Dynamic Perspective in the DG-BEV text is not very good. In addition, my experiments show that his results are not very significant. My advice is to abandon this strategy (Dynamic Perspective) and the latter strategy (Domain-Invariant Feature Learning), which does not necessarily lead to more unstable results each time. Thank you again for your interest in my work. I think the BEV cross-domain problem performance is still very poor and deserves further improvement.

Thanks for the prompt response. When you mention about an effective strategy, just to confirm you transform the image using the homography and feed this image to model with the original gt, intrinsic and extrinsic(sensor2keyego). Also I assumed that you never have to change the gt in DP-DG? As the sensor2keyego is the only thing which is modified and lss will get the lidar coord w.r.t to new projection. is my understanding correct?

Yes, you are right. His ultimate goal is to change the image, and change the sensor2keyego for lss. But the position of the object in 3D space does not change.

can you also give a bit more detail about the effective strategy to use DP-DG as you mentioned in the above comment?

I highly recommend you take a look at his supplement.

I tried to find it online but unfortunately could not, can you share the link for the supplement

can you give me your email, or draw me a email hlu585@connect.hkust-gz.edu.cn, I will sent you a pdf.

I have emailed you. if you cannot find it you can share it on ritesh.khrn@gmail.com

riteshkhrn commented 5 months ago

Thanks for the key insights and guiding me. Kudos to the great work !!