evaluation metrics - Githubissues

This adds the DialAM evaluation metrics (CASS, graph-based F1 etc) based on the following code: https://github.com/arg-tech/AMF-Evaluation-Metrics that was slightly adapted to work with the data from the shared task (the original version uses AIF format).

The script can be run as follows: $ python3 src/evaluation/evaluation_metrics.py predicted_nodeset_path gold_nodeset_path

Unfortunately, I could not find any reasonable data for unit tests to check that it works correctly. The examples in the CASS paper look quite complex and also involve different segmentation boundaries that are not relevant for the shared task since we already have all the nodes (and the L-node/I-node text) given. At least for identical nodeset we are getting 1.0 for CASS, F1 and other metrics which looks correct :)

EDIT: It seems that when pre-commit tries to install GMatch4py from source git+https://github.com/Jacobe2169/GMatch4py.git it fails to find numpy and it outputs the following message: " Getting requirements to build wheel did not run successfully." This happens even after I explicitly added numpy to requirements.txt. All pre-commit checks run w/o error messages in my local set-up (I already have numpy & Co installed in my environment) but I don't know how to make pre-commit happy also in GitHub...

EDIT2: Well, this is not an "elegant solution" but I moved the instructions for installing GMatch4py to evaluation_metrics.py and removed it from requirements.txt. At least pre-commit seems to be happy now :)

ArneBinder / dialam-2024-shared-task

evaluation metrics #5