Closed AtharvanDogra closed 2 years ago
Hi, if you want to train on a new dataset, you may follow this instruction and named_entity_recognition.md.
@wangxinyu0922 I do understand this part. I just want to l know as this code has been made according to data of CONLL 2003, it had different tags (like MISC). There are some different tags in the dataset I want to use. Will I have to make some changes in the kind of tags it can read?
The label space is decided by tag_dictionary
. When the path to the tag_dictionary
is not exist, the code will read the dataset and create a new tag_dictionary
in the path. For example at resources/taggers/your_ner_tags.pkl
. Note that if the path to tag_dictionary
exists, the code will follow the tags in that tag_dictionary
, so please be careful about that. Therefore, you can simply specify path to your dataset and the new tag_dictionary
path.
targets: ner
ner:
Corpus: ColumnCorpus-1
ColumnCorpus-1:
data_folder: datasets/conll_03_new
column_format:
0: text
1: pos
2: chunk
3: ner
tag_to_bioes: ner
tag_dictionary: resources/taggers/your_ner_tags.pkl
@wangxinyu0922 OK i'll give it a try once I am able to reproduce the results from your dataset only, then I'll let you know if it worked
the _ _ O
main _ _ O
contractor _ _ O
was _ _ O
ssangyong _ _ B-CORP
engineering _ _ I-CORP
and _ _ I-CORP
construction _ _ I-CORP
. _ _ O
I have a dataset with tokens and tags like this, how should I proceed with using it in this model? Training and predicting.
(P.S. I am still not able to run the test script on the model provided with you, but i've been able to train a new model on your dataset)
the _ _ O main _ _ O contractor _ _ O was _ _ O ssangyong _ _ B-CORP engineering _ _ I-CORP and _ _ I-CORP construction _ _ I-CORP . _ _ O
I have a dataset with tokens and tags like this, how should I proceed with using it in this model? Training and predicting.
(P.S. I am still not able to run the test script on the model provided with you, but i've been able to train a new model on your dataset)
To train and test on a new dataset like this, just follow the dataset settings above. Specify your data_folder
in the config and use a new tag_dictionary
path.
For the testing, if you have successfully trained a model on your own dataset, the testing should be successful as well. You may provide some screenshots of the problem so that I can help you further (is it the problem in another issue?).
What should I do if the data I'm using has a different label space than the data used