best_span highlights the first string match, not the actual span

allenai / allennlp-demo

Code for the AllenNLP demo.

https://demo.allennlp.org

Apache License 2.0

195 stars 80 forks source link

best_span highlights the first string match, not the actual span #684

Closed jonborchardt closed 3 years ago

jonborchardt commented 3 years ago

on reading comp… the best_span is always wrong

"best_span": [
    12,
    13
  ],

"best_span_str": "Robbie Gould", the ui just ignores this and highlights the first occurrence of the best_span_str… but it would be a nice hack to remove if the api worked

dirkgr commented 3 years ago

Out of the three issues you gave me, this one is least important. Is this a backend issue? It reads like the frontend needs to respect "best_span" instead of "best_span_str", and that's it?

schmmd commented 3 years ago

@dirkgr I think the problem is that the best_span returned is simply wrong... so this is a bug in the predictions API.

jonborchardt commented 3 years ago

@schmmd is correct. the FE is dealing with this, but we would like to have a correct span returned

this one IS less important

dirkgr commented 3 years ago

I don't see anything wrong with the spans returned. They are offsets into "passage_tokens", and they are inclusive, i.e., the second number is the index of the last token in the span.

jonborchardt commented 3 years ago

Thanks, ill take a look.

On Thu, Jan 7, 2021 at 3:48 PM Dirk Groeneveld notifications@github.com wrote:

Assigned #684 https://github.com/allenai/allennlp-demo/issues/684 to @jonborchardt https://github.com/jonborchardt.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/allenai/allennlp-demo/issues/684#event-4182215096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6QQEO5MXFRWFGD64DR7Y3SYZB5BANCNFSM4VHRY5OA .

Cheers, Jon Borchardt

jonborchardt commented 3 years ago

i am updating the ui to not use the best_span, since we dont render out the tokens.

however, i want to call out an inconsistency: the best_span returned by bidaf are into tokens, while the spans returned by naqanet are char positions in the passage or question.

no action item here, but its odd and may be confusing to end users.