clamsproject / aapb-evaluations

Collection of evaluation codebases
Apache License 2.0
0 stars 1 forks source link

side-by-side view in NER eval script #17

Closed keighrim closed 1 year ago

keighrim commented 1 year ago

New Feature Summary

As proposed and documented at https://github.com/clamsproject/aapb-collaboration/issues/24 , we'd like to add additional functionality to the current version of NER evaluation code to generate a side-by-side view for each gold-prediction pairs.

Implementation-wise, we can insert something here https://github.com/clamsproject/aapb-evaluations/blob/e18576afc389ee05ec07d0ef830efd22c38144b4/ner_eval/evaluate.py#L153-L156

that does like this;

    gold_matches, test_matches = file_match(golddirectory, testdirectory)

    generate_side_by_side(zip(gold_matches, test_matches), s_b_s_outdir)

    tokens_true = directory_to_tokens(gold_matches)
    tokens_pred = directory_to_tokens(test_matches)

with

def generate_side_by_side(pairs, outdir):
    for pair in pairs:
        guid = get_guid(pair)
        with open(pathlib.Path(outdir) / f"{guid}.sbs.csv") as out_f:
            gold_tokenized_labels = read_tokenized_labels_from_ann(pair[0])
            pred_tokenized_labels = read_tokenized_labels_from_mmif(pair[1])
            for i, label_pair in enumerate(zip(gold_tokenized_lables, pred_tokenized_labels))):
                out_f.write(",".join(i, *label_pair))
                out_f.write("\n")

(written as a pseudocode, not supposed to run)

Related

No response

Alternatives

No response

Additional context

No response