dmis-lab / BERN2

BERN2: an advanced neural biomedical namedentity recognition and normalization tool
http://bern2.korea.ac.kr
BSD 2-Clause "Simplified" License
175 stars 42 forks source link

BERN2 API endpoint not working #38

Closed Travis-Barton closed 2 years ago

Travis-Barton commented 2 years ago

Hey all,

I'm trying to use the BERN2 endpoint to annotate 71 PubMed papers and it fails after 5 repeatedly.

from tqdm import tqdm
import pandas as pd
import requests
import time

def query_pmid(pmids, url="http://bern2.korea.ac.kr/pubmed", attempts=0):
    try:
        return requests.get(url + "/" + ",".join(pmids)).json()
    except Exception as e:
        if attempts < 3:
            time.sleep(3)
            return query_pmid(pmids, attempts=attempts+1)
        else:
            raise Exception(f'Failed to query {pmids} after 3 attempts with error: {e}'
                            f'\nurl: {url + "/" + ",".join(pmids)}')

if __name__ == '__main__':
    df = pd.read_csv('papers.csv')
    pubmed_ids = [str(i) for i in df['pmid'].tolist()]
    annotations = pd.DataFrame()
    for i in tqdm(range(0, len(pubmed_ids), 5)):
        res = query_pmid(pubmed_ids[i:min(i+5, len(pubmed_ids))])
        annotates = pd.json_normalize(res[1]['annotations'])
        annotations = pd.concat([annotations, annotates], axis=0)

This doesn't feel like a lot of papers, is something wrong with BERN2's endpoint?

It always seems to be able to handle the first 5, but the second 5 seems to fail.

first 5: ['35652272', '35606759', '35881063', '34919325', '34183523'] URL: http://bern2.korea.ac.kr/pubmed/35652272,35606759,35881063,34919325,34183523 second 5: ["36186844", "34844098", "35673942", "35692500", "36175207"] URL: http://bern2.korea.ac.kr/pubmed/36186844,34844098,35673942,35692500,36175207

mjeensung commented 2 years ago

Hi @Travis-Barton

Thanks for reporting the issue. I figured out that the annotations for PMID 36186844 were corrupted.

I re-annotated the document, so could you try it again?

Travis-Barton commented 2 years ago

Seems to work great! Thanks!