Create Evaluation Datasets

[x] Config from Document
[ ] Config from Dataset

We have created a config from document dataset with approximately 900 pairs of documents and AMRs. For evaluation, mask out the values for parameters and initials and then compare prediction against ground truth. Evaluate using precision, recall, f1. The dataset is uploaded to the shared drive.

Config from dataset strategy is TBD. Perhaps we can use this same dataset, and then map values from the existing AMRs into tabular format. Evaluate the model's ability to map the values in the tables back into the AMR.

DARPA-ASKEM / GoLLM

Create Evaluation Datasets #29