Closed lubin202209 closed 1 year ago
It is mainly used to shrink the spatial resolution to save GPU computation cost and training time.
Because I don't want to use shrink_head or compress, I set the flags of both to false. At this time, I want to train the pointpillar_v2xvit model. I set the feature_stride in the yaml file postprocess to 2 like below. How can I set the parameters in the yaml file transformer so that the size of the tensor shape can match?
You can still use shrink header but change the stride from 2 to 1, so it will only modify the channel for the ViT to directly use and won't have big impact on the performance . If you insist to remove the whole head, then you need to adjust the channel number in ViT
Yes, I insist to remove the whole head, I change the config "dim" in the cav_att_config and the config "dim" in the pwindow_att_config and the config "mlp_dim" in the feed_forward from the original 256 to 384 like below, however, related errors about tensor size mismatch will still be reported during the training process, so could you please explain in more detail how to adjust the channel number?
Your heads*dim_head should be 384 as well
Hello, I have a question about the shrink_header. Could you please tell me what's the meaning of shrink_header here as I can use the compression to compress the features?