This repository contains the code for replicating results from
pip install -r requirements.txt
setup_all.sh
.
setup_training.sh
and extract_bert_features.sh
ontonotes_path
variable.experiments.conf
best
python train.py <experiment>
logs
directory and can be viewed via TensorBoard.python evaluate.py <experiment>
python demo.py final
final
with your configuration name.{
"clusters": [],
"doc_key": "nw",
"sentences": [["This", "is", "the", "first", "sentence", "."], ["This", "is", "the", "second", "."]],
"speakers": [["spk1", "spk1", "spk1", "spk1", "spk1", "spk1"], ["spk2", "spk2", "spk2", "spk2", "spk2"]]
}
clusters
should be left empty and is only used for evaluation purposes.doc_key
indicates the genre, which can be one of the following: "bc", "bn", "mz", "nw", "pt", "tc", "wb"
speakers
indicates the speaker of each word. These can be all empty strings if there is only one known speaker.python predict.py <experiment> <input_file> <output_file>
, which outputs the input jsonlines with predicted clusters.GPU
environment variable, which the code treats as shorthand for CUDA_VISIBLE_DEVICES
.