Doubt regarding picking of the best ADE/FDE

I was seeing the code for picking the best error per sequence (scripts/evaluate_model.py):

def evaluate_helper(error, seq_start_end):
    sum_ = 0
    error = torch.stack(error, dim=1)

    for (start, end) in seq_start_end:
        start = start.item()
        end = end.item()
        _error = error[start:end]
        _error = torch.sum(_error, dim=0)
        _error = torch.min(_error)
        sum_ += _error
    return sum_

and I was wongering, why are we picking the best samples that minimize the sum of the errors in the sequence, and not picking the sample with smallest error per trajectory? Something along the lines of:

def evaluate_helper(error, seq_start_end):
    sum_ = 0
    error = torch.stack(error, dim=1)
    _error_min = torch.min(_error, dim=1)
    sum_ += torch.sum(_error, dim=0)
    return sum_

Example of results, using the pretrained SGAN-20V-20 models that the authors offered (compare with this):

Model	ADE12	FDE12
ETH	0.62	1.10
Hotel	0.37	0.79
Univ	0.30	0.55
Zara1	0.21	0.39
Zara2	0.19	0.36

While the errors become smaller, I suppose it would be an even more unfair comparison with methods that are deterministic, as other issues have pointed out - #8

Lemme know if you've thought about this as well, or if you spot any issues.

agrimgupta92 / sgan

Doubt regarding picking of the best ADE/FDE #95