Metric for DocVQA - Githubissues

boostcampaitech4lv23nlp1 / final-project-level3-nlp-03

Multi-Modal Model for DocVQA(Document Visual Question Answering)

3 stars 0 forks source link

Metric for DocVQA #3

Closed hundredeuk2 closed 1 year ago

hundredeuk2 commented 1 year ago

목표

DocVQA 대회에 대한 Metric 서칭

세부사항

기존 코드 사용

def ANLS(pred,answers): if answers[0] is not None: scores = [] for ans in answers: ed = editdistance.eval(ans.lower(),pred.lower()) NL = ed/max(len(ans),len(pred)) scores.append(1-NL if NL<0.5 else 0) return [max(scores)] return []

Reference

https://github.com/herobd/layoutlmv2/blob/main/eval_docvqa.py

Ssunbell commented 1 year ago

+ metric에 대한 설명 및 해석

Average Normalized Levenshtein Similarity (ANLS).

The ANLS smoothly captures the OCR mistakes applying a slight penalization in case of correct intended responses, but badly recognized. It also makes use of a threshold of value 0.5 that dictates whether the output of the metric will be the ANLS if its value is equal or bigger than 0.5 or 0 otherwise. The key point of this threshold is to determine if the answer has been correctly selected but not properly recognized, or on the contrary, the output is a wrong text selected from the options and given as an answer.

More formally, we define ANLS as follows:

스크린샷 2023-01-10 오후 8 59 19

where N is the total number of questions in the dataset, M is the total number of GT answers per question, a{ij} are the ground truth answers where i = {0, ..., N }, and j = {0, ..., M }, and o{q{i}} is the network’s answer for the ith question q{i} . N L(a{ij} , o{q{i}} ) is the normalized Levenshtein distance between the strings a{ij} and o{q{i}} (notice that the normalized Levenshtein distance is a value between 0 and 1). We define a threshold τ = 0.5 that penalizes metrics larger than this value, thus the final score will be 0 if the NL is larger than τ .

The intuition behind the threshold is that if an output has an edit distance of more than 0.5 to an answer, meaning getting half of the answer wrong, we reason that the output is the wrong text selected from the options as an answer. Otherwise, the metric has a smooth response that can gracefully capture errors in text recognition.

수식의 해석

정규 레벤슈타인 거리는 단어 유사성을 구하는 알고리즘임

2 . 레벤슈타인 거리는 두 문자열이 같아지려면 몇번의 문자 조작(삽입, 삭제, 변경)이 필요한지 구하는 것이므로 값이 높아질수록 문자 조작이 필요하므로 유사한 문자열이 아님. 반대로 값이 낮아질수록 문자 조작이 필요하지 않으므로 유사한 문자열임.

정규 레벤슈타인 거리의 값이 임계점 타우를 넘지 않으면(Pre 정답과 GT 정답이 유사하면) : 1 - NL
임계점 타우를 넘어가버리면(너무 많은 문자 조작이 일어나 Pre 정답과 GT 정답이 유사하지 않으면) : 0
위의 계산을 sum해서 평균냄. 즉, 정답의 알파벳 하나하나 맞추는 것과 적어도 반 이상은 맞추는게 중요함