Closed Kin-Zhang closed 2 years ago
Also I would stick to the default number of epochs. I did not try that many epochs so I do not know what performance would look like. Although there is some data augmentation, it might be that model starts to overfit with 100 epochs.
since the bundled compression files have some problems with my extract process always stuck, I downloaded the raw dataset through the folder.
I just use the online leaderboard to evaluate, the route file is private. I just tried the first five route and kill it to make sure the effect since it's enough and the whole process needs a long time
the batch size and epoch table is here: batch_size * gpu_num, since I rewrite DP to DDP to speed up the training process.
phase | batch size | epoch | default epoch* |
---|---|---|---|
train_bev | 512 | 160 | 160 |
train_bra | 64*2 | 100 | 10 |
train_seg | 158*4 | 100 | 1 |
train_full perceive_only | 12*8 | 100 | 15 |
train_full | 20*8 | 100 | 15 |
and of course, I run the lidar paint with trained segmentation model above.
Is the default epoch* you mentioned about the repo said as I attach in the table, since I saw segmentation loss still decrease after 1 epoch, it not enough for train_seg with 1 epoch as default.
The default weights perform even better with my whole dataset trained but it also has serious collision and didn't perform the effect as LAV on online leaderboard ... that's why I'm curisous about it.
Here is more information about the loss figure I said in seg and bra training:
Thanks for your help.
I just use the online leaderboard to evaluate, the route file is private. I just tried the first five route and kill it to make sure the effect since it's enough and the whole process needs a long time
The first 5 runs would not be indicative because it is 5 repetitions of the first route, and routes have varying difficulties. I would be able to help more if you evaluate on public routes and have a visualization.
Is the default epoch* you mentioned about the repo said as I attach in the table, since I saw segmentation loss still decrease after 1 epoch, it not enough for train_seg with 1 epoch as default. Yes. Our online leaderboard entry also uses these number of epochs settings.
I would not insist on 100 epochs since loss decreasing does not mean better performance at test time. The loss plotted here is the training loss, and even if it is validation loss it would not mean the model will drive better due to distribution mismatch.
The first 5 runs would not be indicative because it is 5 repetitions of the first route Our online leaderboard entry also uses these number of epochs settings.
Oh, I noticed now.. thanks for reminding. I would try the default settings epoch to see whether it performs better.
if you evaluate on public routes and have a visualization.
what's information need to be visualized? the bev map, seg effect or detection?
and even if it is validation loss it would not mean the model will drive better due to distribution mismatch.
Thanks for guiding me on this, I will try again. But how to evaluate that epoch 1 is the best choice? or how did you select those default epoch?
Really appreciate.
Thanks for guiding me on this, I will try again. But how to evaluate that epoch 1 is the best choice? or how did you select those default epoch?
Note that the only model that trains with 1 epoch is semantic segmentation which is not an end task model. What is needed is only the semantic scores that got painted to LiDAR. Sure you will get higher IoU if you train and wait longer but as long as the most useful stuff, i.e. roads, vehicles, pedestrians get reasonably segmented it would suffice as a sensor fusion model. It also likely does not need refined segmentation masks due to the low resolution of 64-ray lidar.
P.S. Apologize in advance I will not be able to reply on this issue as promptly as before since I will start to get busy with other work
I will not be able to reply on this issue as promptly as before
It's okay, I will just leave the message here. And really appreciate for replying to our questions.
Sure you will get higher IoU if you train and wait longer but as long as the most useful stuff, i.e. roads, vehicles, pedestrians get reasonably segmented it would suffice as a sensor fusion model. It also likely does not need refined segmentation masks due to the low resolution of 64-ray lidar.
Thanks for explaining it. But for train_full produce lidar.th
and uniplanner.th
, did you run like 20 epochs and evaluate them one by one to select the 15 epochs for default as it achieves the best effect?
I just evaluate it at Town01 devtest.xml
from the official leaderboard route file. Here is one of the scenarios that cause lots of collisions also in other routes like collisions with pedestrians:
The model I used is trained as above said dataset with selected the earlier epoch: | phase | batch size | epoch | default epoch* |
---|---|---|---|---|
train_bev | 512 | 160 | 160 | |
train_bra | 64*2 | 10 | 10 | |
train_seg | 158*4 | 10 | 1 | |
train_full perceive_only | 12*8 | 20 | 15 | |
train_full | 20*8 | 20 | 15 |
As I saw in the video, is the detection too slow to detect the cyclist? But the seg result is great, so I have no idea how to improve it as the LAV origin performs like..
Aha, so you have encountered one of the most hilarious collisions in the current scenario runner setup. Note how the cyclist does not move until the ego vehicle has already passed it. This is because the triggerbox location is not properly setup for this particular scenario. It would also be inappropriate if the ego car stop before the cyclist has entered the road. I do not have a neat solution for this, and even our leaderboard entry suffer from this. But if you really want to avoid this, you can make the car go slower or faster, so the cyclist is triggered while the car is approaching or already passed it.
Thanks for letting me know about it. Really appreciate.
you can make the car go slower or faster, so the cyclist is triggered while the car is approaching or already passed it.
So the default speed as config.yaml
is shown here is not default that the online LAV? since the speed may be too fast?
https://github.com/dotchen/LAV/blob/04d23bfd68692ed5f9f57ce77decc5f0eb821d40/config.yaml#L75-L76
I also notice that with trained full LAV, there are a lot red light infractions, but it seems that online LAV did really great since at first five routes(repeat just first route) it didn't occur any red light infractions. It may relate to bra.th
?
But for train_full produce lidar.th and uniplanner.th, did you run like 20 epochs and evaluate them one by one to select the 15 epochs for default as it achieves the best effect?
I think that's all problems I have, really thanks for all these replies.
Our online leaderboard also uses 35km/h as the speed cap. What I said in my reply was that, even when I evaluate our online leaderboard entry on local routes, I have seen such error modes. But if you really want to just avoid such errors in this particular route that you are evaluating, you can make the speed cap lower. However, this will lead to performance difference on potential other metrics, like timeouts. Does this make sense?
Also let me know how you know what our online leaderboard performs in the first 5 routes cuz even I don't that :P
Also let me know how you know what our online leaderboard performs in the first 5 routes cuz even I don't that :P
The public entry could be viewed through the leaderboard by metric and see the infraction on each metric.
I see. I will keep these settings the same as afterward comparison of methods.
Thanks again! ❤️
Thanks for letting me know about it. Really appreciate.
you can make the car go slower or faster, so the cyclist is triggered while the car is approaching or already passed it.
So the default speed as
config.yaml
is shown here is not default that the online LAV? since the speed may be too fast?https://github.com/dotchen/LAV/blob/04d23bfd68692ed5f9f57ce77decc5f0eb821d40/config.yaml#L75-L76
- I also notice that with trained full LAV, there are a lot red light infractions, but it seems that online LAV did really great since at first five routes(repeat just first route) it didn't occur any red light infractions. It may relate to
bra.th
?But for train_full produce lidar.th and uniplanner.th, did you run like 20 epochs and evaluate them one by one to select the 15 epochs for default as it achieves the best effect?
Hi, dotchen and Kin-Zhang!
I also want to figure out how to determine the best model uniplanner
. As you provide the model named uniplanner_v2_7.th in weights
folder, does it mean that the model is trained for 7 epochs? However, the model trained by myself can't reach the same metrics as high as the model you provide, which is also trained for 7 epochs.
Question about the training phase:
The whole dataset has some data.mbd loss in folders, and the provided dataset frame is shown here:
At training phase: End-to-end Training, I downloaded the whole dataset and it's wried at some of the loss, I'd like to ask that is it correct or just normal for these kinds of loss?
The training detail is same as the default
config.yml
with all phases are trained in 100 epochs andbev
with 160 epochs with suitable batch size, the evaluation is really terrible at the online leaderboard shown I just finished the first five route and see the result file notice the collision is serious, I'm wondering is there any process I miss that cannot reproduce the effect to evaluate? since with only 1% data loss will not effect so much on the training model.Thanks again for your work!