dmis-lab / bern

A neural named entity recognition and multi-type normalization tool for biomedical text mining
https://bern.korea.ac.kr
BSD 2-Clause "Simplified" License
173 stars 44 forks source link

There is a bug when bern recognize the 'Excerpt' section in some papers, shown below. #14

Open SongbiaoZhu opened 4 years ago

SongbiaoZhu commented 4 years ago

There are some articles with 'Excerpt' section, instead of 'Abstract' section.

For example, these pimids, ['29787038', '30844201', '31643199', '31643392', '31643562', '31855378', '31869126'].

Error output

BERN returned the html text on this kind of pmid, as [{"project":"BERN","sourcedb":"PubMed","sourceid":"31869126","text":"error: tmtool: <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE collection SYSTEM 'BioC.dtd'>PubTatorBioC.key","denotations":[],"timestamp":"Sat Apr 11 04:43:19 +0000 2020"}] Please have a notice. Besides, BERN is such a nice tool and good job!

amalic commented 4 years ago

Is anybody reading this?

donghyeonk commented 4 years ago

The cases you reported occur in very recent PMIDs and are caused by an external dependency, tmTool, of BERN.

I recommend you to use a HTTPS POST call (i.e., API for raw text) for very recent PMIDs.

amalic commented 4 years ago

API call for raw text does not work either

See Github issue https://github.com/dmis-lab/bern/issues/17