Open Abyss-J opened 1 year ago
Hey, thanks! This is very useful. I will try this later and update the repo. I don't know about that distributed issue -- I haven't encountered that myself...
When I tried to train bevformer2, I used two 3090 GPUs for training and reported an error of
ERROR: torch. distributed. final. multiprocessing. api: failed (exitcode: -6) local rank: 1 (pid: 26301)
. This error does not occur every time, but the probability of occurrence is high. I noticed that the code has already commented that using multi-scale feature will not work. After checking the code, I found that there was an issue with the parameters of VanillaSelfAttention. and SpatialCrossAttention. When using multi-scale features,n_levels
needs to be set to the number of multi-scale features of3
to solve the problem.
could you please check this issue
Segnet is used in the train_nuscenes.py, I changed it as from nets.bevformernet2 import Bevformernet
and I also changed the model in train_nuscenes.py.
However, I show me an error when it goes bevformernet2.py#L497 .
Could you please share how you run or modify the codes?
Best regards and thank you very much!
When I tried to train bevformer2, I used two 3090 GPUs for training and reported an error of
ERROR: torch. distributed. final. multiprocessing. api: failed (exitcode: -6) local rank: 1 (pid: 26301)
. This error does not occur every time, but the probability of occurrence is high. I noticed that the code has already commented that using multi-scale feature will not work. After checking the code, I found that there was an issue with the parameters of VanillaSelfAttention. and SpatialCrossAttention. When using multi-scale features,n_levels
needs to be set to the number of multi-scale features of3
to solve the problem.could you please check this issue
Segnet is used in the train_nuscenes.py, I changed it as
from nets.bevformernet2 import Bevformernet
and I also changed the model in train_nuscenes.py.However, I show me an error when it goes bevformernet2.py#L497 .
Could you please share how you run or modify the codes?
Best regards and thank you very much!
@Abyss-J could you please help me to do this?
Well, I just checked the code for bevformernet2, for some reuse, and I didn't reproduce the author's experiment. If you reported an error in the encoder section, maybe you need to check the input, hoping it's useful.
@Abyss-J Thank you for your reply. And could you please share how you try to run the code of bevformernet2
and the python config methods? I want to check it. I can only run the bevformernet experiment.
When I tried to train bevformer2, I used two 3090 GPUs for training and reported an error of
ERROR: torch. distributed. final. multiprocessing. api: failed (exitcode: -6) local rank: 1 (pid: 26301)
. This error does not occur every time, but the probability of occurrence is high. I noticed that the code has already commented that using multi-scale feature will not work. After checking the code, I found that there was an issue with the parameters of VanillaSelfAttention. and SpatialCrossAttention. When using multi-scale features,n_levels
needs to be set to the number of multi-scale features of3
to solve the problem.