Closed Ruthvik9 closed 8 months ago
Hi @Ruthvik9, Thank you for your interest in our work. Regarding your questions:
Per-frame labels may indicate that the pedestrian started to cross, i.e., stepped onto the road. But the crossing label for the entire track is set to 1 only if the pedestrian actually crossed the road in front of the vehicle (traversed its width or crossed in its path if the vehicle is turning). In other words, per-track labels for crossing reflect the final outcome of the pedestrian's actions relative to the ego-vehicle. Even if the pedestrian eventually crossed after the ego-vehicle went past them, we assign the "non-crossing" label to the track because they did not cross in front of the ego-vehicle.
Yes, we used the per-track labels as the ground truth for the benchmark and all samples were collected before crossing. Otherwise, if a person is mid-crossing, it is more of a detection rather than prediction problem. In that paper, we did not analyze the post-crossing activity.
Thank you so much for the prompt response!
After going through your "Benchmark for evaluating Pedestrian Action Prediction" work, I had another doubt regarding these annotations. In that paper, you mention the following - i.e., "The objective is to predict whether the pedestrian will start crossing the street at some time t given the observation of length m. We define the event at the time the pedestrian starts to cross or the last frame the pedestrian is observable in case no crossing takes place."
i) Is it "whether the pedestrian will start crossing the street" as present in the paper or something like "whether the pedestrian will cross the street in front of the ego-vehicle, predicted t frames before the event"(with the event being defined as the time the pedestrian starts to cross, not necessarily in front of the ego-vehicle)? Because as per your response, I believe it should be the latter. ii) In case no crossing takes place, is it the last observable frame that's defined as the event or the last observed frame - 3 as you've defined in this work?
Many thanks!
Yes, when predicting the crossing action, we care whether the pedestrian will eventually end up crossing in front of the car. When the pedestrian is not crossing, we consider the event as the last observed frame - 3 (in the last few frames of the track, only a tiny portion of the pedestrian may still be visible, e.g. a foot).
Got it, thank you very much!!
Hello, First of all, thanks for this amazing work!
I had a couple of doubts regarding your work and I was hoping you could shed some light on them.
1) The annotation of the "crossing" label.
You've defined the "crossing" label as - "crossing: 1 (crossing), 0 (not crossing), -1 (irrelevant). This indicates whether the pedestrian was observed crossing the road in front of the ego-vehicle"
Does "crossing the road in front of the ego-vehicle" mean that the pedestrian was literally seen crossing in front of the ego-vehicle (traversing the width of the ego-vehicle) or does it simply mean that the pedestrian was crossing the road in the field of view in front of the ego-vehicle? (So, in this case, the pedestrian might've started crossing the road, but the ego-vehicle goes past them before they could cross the vehicle.)
2) The C/NC problem
i) In one of your other papers - "Benchmark for evaluating Pedestrian Action Prediction", you solve the C/NC problem. In this case, the "crossing" label is the ground truth label right? And not the per-frame "cross" label, I assume? ii) In the same paper, all the crossing sequences in any observation period for training and testing are pre-crossing? Is there no analysis of what happens once the crossing activity begins? (Apart from trajectory prediction as is done in the PIE paper.)
Thanks in advance for your time!!