dmis-lab / bern

A neural named entity recognition and multi-type normalization tool for biomedical text mining
https://bern.korea.ac.kr
BSD 2-Clause "Simplified" License
173 stars 44 forks source link

What does the start and end tag in entities represent? #5

Closed shivanik96 closed 4 years ago

shivanik96 commented 4 years ago

In the annotated PubMed data that you have shared, what do the 'start' and 'end' tag in 'entities' represent? Initially, I thought that they were character position but now I am not so sure. Can you please confirm.

shivanik96 commented 4 years ago

I hit and tried various combinations and found out that the 'start' and 'end' tags in 'entities' actually are character offsets but they are not of the abstract, as I earlier thought. In fact, the offsets are calculated by appending title and abstract together separated by a tab.

donghyeonk commented 4 years ago

Hi @shivanik96

Sorry for the late reply.

Character-based indexes are provided for a string concating title and abstract.

We'll double check that there is no problem with that character-based index.

Thank you.

shivanik96 commented 4 years ago

Hey @donghyeonk In the concatenated string, I think there is a tab in between the title and the abstract. Is it correct?

donghyeonk commented 4 years ago

@shivanik96 There is a space (i.e., " ") between the title and the abstract.

shivanik96 commented 4 years ago

@donghyeonk thank you so much for your prompt replies. You guys have done great work.