DARPA-ASKEM / GoLLM

Service to run GoLLM and define its tools.
Apache License 2.0
1 stars 0 forks source link

Create Evaluation Datasets #29

Closed j2whiting closed 5 months ago

j2whiting commented 6 months ago

We have created a config from document dataset with approximately 900 pairs of documents and AMRs. For evaluation, mask out the values for parameters and initials and then compare prediction against ground truth. Evaluate using precision, recall, f1. The dataset is uploaded to the shared drive.

Config from dataset strategy is TBD. Perhaps we can use this same dataset, and then map values from the existing AMRs into tabular format. Evaluate the model's ability to map the values in the tables back into the AMR.