autonomousvision / transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
MIT License
1.12k stars 186 forks source link

Segmentation failed #200

Closed Oliverwang11 closed 8 months ago

Oliverwang11 commented 8 months ago

Hi I try to evaluate the transfuser based agent using./leaderboard/scripts/local_evaluation.sh /home/<usrname>/Desktop/transfuser/carla /home/<username>/Desktop/transfuser on Ubuntu 22.04.3.

But with some errors pop out

/home/<usrname>/Desktop/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py:89: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(dist.version) < LooseVersion('0.9.10'): ./leaderboard/scripts/local_evaluation.sh: line 32: 20110 Segmentation fault (core dumped) python3 ${LEADERBOARD_ROOT}/leaderboard/leaderboard_evaluator_local.py --scenarios=${SCENARIOS} --routes=${ROUTES} --repetitions=${REPETITIONS} --track=${CHALLENGE_TRACK_CODENAME} --checkpoint=${CHECKPOINT_ENDPOINT} --agent=${TEAM_AGENT} --agent-config=${TEAM_CONFIG} --debug=${DEBUG_CHALLENGE} --resume=${RESUME}

anyone has some idea?

Thanks

Kait0 commented 8 months ago

Segmentation faults are hard to analyse. I would suggest you use a debugger or print statements to find the line of code that crashes, than we can help you better.

Oliverwang11 commented 8 months ago

Thanks I will try!

Oliverwang11 commented 8 months ago

It seems like the crash is in self.module_agent = importlib.import_module(module_name) when importing the module_name which is submission_agent. BTW I run the evaluation in my own computer with a GTX3060 GPU-6GB and 16 GB RAM

Kait0 commented 8 months ago

hm that is strange line is just trying to import the agent py file. Are you using the conda environment from this repository? WORK_DIR is the work dir variable in the script correct (e.g. does module_name point to the correct file?) Maybe some second order import problem. If the submission_agent file is executed can you check how far it gets?

Oliverwang11 commented 8 months ago

Hi thanks for your reply, the work dir seems fine. I went deep into the submission_agent.py file it seems like the code crashed at from model import LidarCenterNet when trying to load the LidarCenterNet

Kait0 commented 8 months ago

Failing somewhere specific within LidarCenterNet?

You can try commenting this line. It sometimes makes problems since it depends on an external cuda lib. Its optional for an ablation so its fine to turn it off. https://github.com/autonomousvision/transfuser/blob/22b3ccdb7e806b42c9be2aa476b71546e8ec3620/team_code_transfuser/model.py#L11

Oliverwang11 commented 8 months ago

Aha after I comment this line from point_pillar import PointPillarNet the crash disappear, but there is another problem pop up seems like the town map is loaded but the ego car haven't shown.

./leaderboard/scripts/local_evaluation.sh /home/oliverwang/Desktop/transfuser/carla /home/oliverwang/Desktop/transfuser /home/oliverwang/Desktop/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py:89: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(dist.version) < LooseVersion('0.9.10'): /home/oliverwang/Desktop/transfuser -----submission_agen line 209 -----submission_agen line 16 -----submission_agen line 18 -----submission_agen line 20 -----submission_agen line 21 -----submission_agen line 209 -----submission_agen line 209 Registering the global statistics

Do you have any clue? Thanks!

Kait0 commented 8 months ago

"-----submission_agen line" I suppose these are your debug prints. I don't see an error here. these are just warnings and prints. It might be that you need to delete the results.json file, because the code thinks it already finished all routes.

SY-LG commented 8 months ago

I came accross this segmentation days before. Check if your cuda and pytorch stuff versions matches.

d at from model import LidarCenterNet when trying to load the LidarCenterNet

To check if you are facing the same situation as mine, you can trace even further, and eventually it would turn out that the segmentation fault take place somewhere irrelevant with transfuser but relevant with pytorch stuff

SY-LG commented 8 months ago

BTW, the version relationships are kind of ambiguous for pytorch and mmcv stuffs, sometimes even need to test it yourself. You can try this setup, it works fine for me.