Closed ABaldrati closed 5 months ago
Hi Alberto,
Since GenAD is a diffusion denoising network that takes a diffusion timestep as input, could you specify which diffusion timestep is used during the feature extraction process?
We adopt timestep 0, which corresponds to the lowest diffusion level without noise. Note that GenAD's UNet takes $\sigma$ as a condition to indicate the timestep. The translation from timestep to $\sigma$ follows this implementation.
Can you confirm that cleaned frames are used as input during this feature extraction process?
Yes, original frames are applied without noise augmentation.
From which GenAD encoder layer (or layers) are the features extracted?
Our feature extraction process ends at the middle block of GenAD's UNet (see the illustration below), and all upsampling blocks are not used for this task.
Could you please provide more details on how the features are processed in the MLP?
We simply flatten the feature map after it is extracted by the UNet. The feature sequence is then sent to an MLP that regresses the planning waypoints.
Great! Thank you so much for your detailed explanation!
Dear Authors,
First of all, thank you for your excellent work on the paper. I have studied your approach and have some questions regarding the feature extraction process when using GenAD encoder for planning as described in the following paragraph:
I am particularly interested in understanding the following points:
Thank you very much for your time and help.
Best regards,