fastnlp / TENER

Codes for "TENER: Adapting Transformer Encoder for Named Entity Recognition"
370 stars 55 forks source link

Can this be used on non CONLL-2003 data format? #25

Open hetryn opened 3 years ago

hetryn commented 3 years ago

As above, can TENER preprocessing be done on dataset that does not follow CONLL-2003 format? My dataset does not have BIO scheme tagging. Meaning the sentences will look like this.

sentence = ['Hi', 'I', 'study', 'in', 'China', 'and', 'work' , 'in', 'ABC']
tag = ['O', 'O', 'O', 'O', 'Country', 'O', 'O', 'O', 'Company']
yhcc commented 3 years ago

Sorry for the late reply. You can re-use the TENER encoder, but the pre-processing and decoding may be suitable for your input. You can try to convert your data into the BIOES type.