SEM errors while running ScenarioFuzz

Kim-mins commented 2 months ago

Hi, @AtongWang!

I read your paper(Thank you for the nice work!) and I'm trying to run ScenarioFuzz on my local machine. However, I'm having trouble on running the code myself. Could you please help me?

Details

Here are some details: When I tried to run the code with the seeds given from the repository on Town01(of carla), I encountered the error below, and it seems the error is from torch_geometric library: Error message: (from the seed ==========USING SCENARIO SEED:Town01 - t_intersection - 0==========)

File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 272, in _lift
    return src.index_select(self.node_dim, index)
IndexError: index out of range in self

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ssd1tb/ADSTesting/thirdparty/ScenarioFuzz/src/fuzzer_eval.py", line 772, in <module>
    main()
  File "/ssd1tb/ADSTesting/thirdparty/ScenarioFuzz/src/fuzzer_eval.py", line 520, in main
    out = eval_model(batch.x, batch.edge_attr, batch.weather_attr, batch.edge_index, batch.batch, batch.edge_batch)
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/ssd1tb/ADSTesting/thirdparty/ScenarioFuzz/src/scenario_eval_model/eval_model.py", line 146, in forward
    edge_h = F.relu(self.edge_gat1(edge_attr, edge_index))
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch_geometric/nn/conv/gat_conv.py", line 252, in forward
    alpha = self.edge_updater(edge_index, alpha=alpha, edge_attr=edge_attr)
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 528, in edge_updater
    kwargs)
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 336, in _collect
    data = self._lift(data, edge_index, dim)
  File "/home/mins/anaconda3/envs/python37/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 276, in _lift
    f"Encountered an index error. Please ensure that all "
IndexError: Encountered an index error. Please ensure that all indices in 'edge_index' point to valid indices in the interval [0, 699] (got interval [0, 1093])

Also, sometimes I could get nan from SEM, and it makes a current cycle skip, so I cannot run the fuzzer anyway. (nan is from the seed ==========USING SCENARIO SEED:Town01 - t_intersection - 2==========)

Environments

I've been running the code with python 3.7 with cuda 11.6, and set every library (e.g., torch, torch_geometric, carla python api, ...) following requirements.txt, but I could not resolve it.

Arguments

Here's my arguments for running ScenarioFuzz:

-o: somewhere I want
--scenario-lib-dir: scenario_lib # which means I'm using the seeds from the repositories.
-c: 3
-m: 3
--eval-model: improve-v3
--town: Town01
--device: cpu # seems SEM is forced to run on cpu, regardless of this option.
--no-use-seed

I've also tried every kinds of --eval-model and device, but the error still remains.

Thank you in advance!

AtongWang commented 1 month ago

Hi, @Kim-mins!

Thank you for pointing out this issue. During the execution, some initial seeds may indeed trigger the error or result in nan values, which could be related to a calculation issue in the torch_geometric library. We are actively working on identifying and fixing the problem.

In the meantime, I suggest skipping the seeds that trigger these errors. Please feel free to reach out if you make any further discoveries or need additional assistance.

Thanks again!

Kim-mins commented 1 month ago

Thank you for the response @AtongWang!

Good to hear that. I hope those errors to be resolved soon!

I also tried your suggestion, and I got two following questions:

Q1. When using SEM improve-v3 on Town01, seems every 5 seed has dimension error or outputs nan as above. So I tried improve-v2, and the only working seed on Town01 is "2". Is it ok to run ScenarioFuzz with this single seed? (Every seed is from scenario_lib of the repository.)

Q2. By the way, I have a question if current implementation does not support online learning. When I read your paper, I thought the algorithm works with online learning(for SEM), but I could not find code for online learning from fuzzer_eval.py. Is my understanding correct?

Thank you!

AtongWang commented 1 month ago

Hi @Kim-mins!

Thank you for your feedback!

For Q1, you're correct. This issue does exist, and it raises another point: during my testing, errors were more biased towards Town02-05, so it's possible that the usable seeds for Town01 are indeed fewer. In my SEM training, the data distribution from Town01 was minimal, which could be causing certain computational errors when invoking my SEM. This is one possibility. I recommend focusing on other maps for now. Alternatively, you could generate your own data for Town01 using fuzzer.py and train your own SEM.

As for Q2, we haven't explicitly written code for online training. It's incorporated into the method flow. Every time the test data accumulates to a multiple of 1K, we package that data and perform an update training for SEM. The SEM gets updated accordingly, which is why you'll see versions like improve-v0, v1, v2, etc.

Thank you for your patience!

Kim-mins commented 1 month ago

Thank you for the kind and detailed response @AtongWang!

So maybe you mean the testing process pauses when you get 1K of data, and you manually train SEM with the collected data, and the SEM is used for the testing and the next 1K of data. Is my understanding correct?

Also, I wonder if the error could be resolved if I train SEM myself. Maybe you are also suffering the error.. so I can not sure for now.

Thanks a lot!

AtongWang commented 1 month ago

Hi @Kim-mins,

Yes, your understanding is correct. The testing process pauses when 1K data points are accumulated, and then SEM is manually trained with the collected data, which is used for testing the next batch.

As for the error, it’s worth trying to train SEM yourself—it might help resolve the issue. We're indeed working on addressing this problem, and I appreciate your understanding and efforts to help improve it. Let's work together and keep each other updated on any progress!

Thanks again for your support!

AtongWang / ScenarioFuzz