Question about word-level detection metrics

google-research-datasets / hiertext

The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and paragraph level annotations.

Creative Commons Attribution Share Alike 4.0 International

261 stars 23 forks source link

Question about word-level detection metrics #7

Closed HumanZhong closed 1 year ago

HumanZhong commented 1 year ago

Hi, Thank you for your great work. Is there any chance you can release the 'word-level' detection metrics(like F-Score and PQ) of your Unified Detector on hiertext validation set?

Jyouhou commented 1 year ago

Hi,

This section in the code repo explains how word level output is obtained.

The evaluation script in this repo computes word-level metrics automatically. Results of Unified Detector is also released here for all levels: https://rrc.cvc.uab.es/?ch=18&com=evaluation&task=1.

HumanZhong commented 1 year ago

Thanks for your reply. And may I ask one more question

In your paper, there seems to be two different models(one for grouping words into paragraphs and the other for grouping lines into paragraphs(Tab.3)). So does this mean your provided results are from two different models (like word-level PQ from word->paragraph model ; line-level and paragraph-level PQ from line->paragraph model)?

Thanks again for your awesome work and looking forward to your reply.

Jyouhou commented 1 year ago

No. As detailed in the previous link, we only released the line-based model. The word-level outputs are obtained by some "tricks" of splitting masks.

We noticed that, even though the training target of the detection branch is the line level mask, it is still able to distinguish adjacent words to some extent, if words are not too close. Note that this splitting trick is not perfect: it only achieves ~48 PQ for word task, while for the word-based model, it can be ~64 PQ.

HumanZhong commented 1 year ago

What about the F-score metric of the word-based model? It will be so nice if you can release this.

Thank you so much.

Jyouhou commented 1 year ago

The paragraph PQ for the word-based models are already in the paper, so I assume you are looking for the word PQ scores:

Model | Precision | Recall | F1 | Tightness | PQ word-based, 128 queries | 0.750084691 | 0.5849323951 | 0.6572932128 | 0.7874197285 | 0.5175656431 word-based, 256 queries | 0.7518331545 | 0.7153329811 | 0.733129042 | 0.781384374 | 0.5728555776 word-based, 384 queries | 0.793973056 | 0.8067676937 | 0.8003192414 | 0.7983057275 | 0.6388994342

HumanZhong commented 1 year ago

Thank you :D