Nealcly / templateNER

Source code for template-based NER
208 stars 39 forks source link

CSV input files #3

Closed laiviet closed 2 years ago

laiviet commented 3 years ago

Can you share the format of the input CSV files? Thank you, Viet

savasy commented 2 years ago

I wrote the following script for my experiments, It may help you to convert BIO format to BART Template Format,

CorpusBIO.txt contains the lines, each has token, label pairs

# example input
IBM B-ORG
is O
a O
...
tokens=[]
labels=[]
for line in open("../CorpusBIO.txt"):
    line=line.replace(';','')
    if len(line.strip())>0:
        token, label=line.split()
        token=token.replace('"','')
        token=token.replace("'","")
        tokens.append(token)
        labels.append(label)
    else:
        buffer_token=""
        buffer_label=""
        first=" ".join(tokens)
        first=first.replace('"','')
        first=first.replace(';','')
        for l,t in zip(labels, tokens):
            if l.split("-")[0]!= 'I' and buffer_token!="":
                print('"%s";%s is a %s entity.' %(first,buffer_token, buffer_label))
                buffer_token=""
                buffer_label=""
            if l.split("-")[0] =='B':
                buffer_token=t
                buffer_label=l.split("-")[1]
            if l.split("-")[0] =='I':
                buffer_token+=" "+ t
        if buffer_token!="":
            print('"%s";%s is a %s entity.' %(first,buffer_token, buffer_label))
        tokens=[]
        labels=[]
laiviet commented 2 years ago

THanks!