I don't quite get the directions in the readme for Creating a new boostrapped Labeling for Unsupervised NER in a different Language and for Different Labels/Terms.
Step 1:
I emptied the files "labels.txt" and "bootstrap_entities.txt". Then i tried both for new boostrapped labeling:
a) just run with an empty seedword list
b) created a new bootstrap_entities.txt with new seed words (all part of my vocab.txt)
Then i called run.sh with Option=1 and Threshold = 0 for vector generation + labeling them according to my seed words.
Upon finishing a LOT of files are written/updated. E.g. adaptive_debug_pivots.txt, inferred.txt, labels.txt, pivots.json, pivots.txt
In the Readme it says:
"Cluster (run.sh with option 1 followed by 0) and then examine cluster pivots to label them.
Then rerun clustering and select candidates from inferred.txt. "
So its not clear which file is meant here by "examine cluster pivots" to me.
Firstly i assumed i have to look at the adaptive_debug_pivots.txt.
So i started to correct Labels in the file adaptive_debug_pivots.txt.
When i restart clusting again (with the same options as above - run.sh with option 1 followed by 0)
the same outputs as in Step 1 are just regenerated identically again.
So all my editing was simply overwritten.
Inferred.txt basically always contains no entries at all.
So i must be doing something wrong.
I don't quite get the directions in the readme for Creating a new boostrapped Labeling for Unsupervised NER in a different Language and for Different Labels/Terms.
Step 1: I emptied the files "labels.txt" and "bootstrap_entities.txt". Then i tried both for new boostrapped labeling:
a) just run with an empty seedword list b) created a new bootstrap_entities.txt with new seed words (all part of my vocab.txt)
Then i called run.sh with Option=1 and Threshold = 0 for vector generation + labeling them according to my seed words.
Upon finishing a LOT of files are written/updated. E.g. adaptive_debug_pivots.txt, inferred.txt, labels.txt, pivots.json, pivots.txt
In the Readme it says: "Cluster (run.sh with option 1 followed by 0) and then examine cluster pivots to label them. Then rerun clustering and select candidates from inferred.txt. "
So its not clear which file is meant here by "examine cluster pivots" to me.
Firstly i assumed i have to look at the adaptive_debug_pivots.txt. So i started to correct Labels in the file adaptive_debug_pivots.txt.
When i restart clusting again (with the same options as above - run.sh with option 1 followed by 0) the same outputs as in Step 1 are just regenerated identically again. So all my editing was simply overwritten. Inferred.txt basically always contains no entries at all. So i must be doing something wrong.
Then i checked the run.sh
python dist_v2.py
pwd
0 vocab.txt bert_vectors.txt 0 results/labels.txt results/stats_dict.txt preserve_1_2_grams.txt glue_words.txt bootstrap_entities.txtand figured that basically the bootstrap_entities.txt contains the pivot clusters. So im pretty much lost now.
Could you please specify more precisely how i can iteratively improve the labeling for the generated clusters?