Alibaba-NLP / ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Other
298 stars 44 forks source link

Different Label Space #27

Closed AtharvanDogra closed 2 years ago

AtharvanDogra commented 2 years ago

What should I do if the data I'm using has a different label space than the data used

PER : Person
LOC : Location
GRP : Group
CORP : Corporation
PROD : Product
CW: Creative Work
wangxinyu0922 commented 2 years ago

Hi, if you want to train on a new dataset, you may follow this instruction and named_entity_recognition.md.

AtharvanDogra commented 2 years ago

@wangxinyu0922 I do understand this part. I just want to l know as this code has been made according to data of CONLL 2003, it had different tags (like MISC). There are some different tags in the dataset I want to use. Will I have to make some changes in the kind of tags it can read?

wangxinyu0922 commented 2 years ago

The label space is decided by tag_dictionary. When the path to the tag_dictionary is not exist, the code will read the dataset and create a new tag_dictionary in the path. For example at resources/taggers/your_ner_tags.pkl. Note that if the path to tag_dictionary exists, the code will follow the tags in that tag_dictionary, so please be careful about that. Therefore, you can simply specify path to your dataset and the new tag_dictionary path.

targets: ner
ner:
  Corpus: ColumnCorpus-1
  ColumnCorpus-1: 
    data_folder: datasets/conll_03_new
    column_format:
      0: text
      1: pos
      2: chunk
      3: ner
    tag_to_bioes: ner
  tag_dictionary: resources/taggers/your_ner_tags.pkl
AtharvanDogra commented 2 years ago

@wangxinyu0922 OK i'll give it a try once I am able to reproduce the results from your dataset only, then I'll let you know if it worked

AtharvanDogra commented 2 years ago
the _ _ O
main _ _ O
contractor _ _ O
was _ _ O
ssangyong _ _ B-CORP
engineering _ _ I-CORP
and _ _ I-CORP
construction _ _ I-CORP
. _ _ O

I have a dataset with tokens and tags like this, how should I proceed with using it in this model? Training and predicting.

(P.S. I am still not able to run the test script on the model provided with you, but i've been able to train a new model on your dataset)

wangxinyu0922 commented 2 years ago
the _ _ O
main _ _ O
contractor _ _ O
was _ _ O
ssangyong _ _ B-CORP
engineering _ _ I-CORP
and _ _ I-CORP
construction _ _ I-CORP
. _ _ O

I have a dataset with tokens and tags like this, how should I proceed with using it in this model? Training and predicting.

(P.S. I am still not able to run the test script on the model provided with you, but i've been able to train a new model on your dataset)

To train and test on a new dataset like this, just follow the dataset settings above. Specify your data_folder in the config and use a new tag_dictionary path.

For the testing, if you have successfully trained a model on your own dataset, the testing should be successful as well. You may provide some screenshots of the problem so that I can help you further (is it the problem in another issue?).