Open evanmiltenburg opened 4 years ago
Maybe it would be nice if people could also send in outputs they systematically manipulated themselves (introducing specific kinds of errors). Then we can see if the evaluation metrics proposed by others can actually pick up on those manipulations.
Maybe it would be nice if people could also send in outputs they systematically manipulated themselves (introducing specific kinds of errors). Then we can see if the evaluation metrics proposed by others can actually pick up on those manipulations.