Create evaluation datasets for various GoLLM Tasks

Problem

We do not have large scale datasets that we can use to evaluate GoLLM tasks
We do not have a method in place for sourcing or creating datasets for new GoLLM tasks

Approach

Create distributions that we can sample from to create synthetic datasets. For example, we can likely create an arbitrary AMR, stratify it and then create synthetic interaction matrices which have cells that map to the newly created AMR.
A dataset of real data will be better where applicable, but we will have significant hurdles to overcome in terms of annotation costs, licensing, and time spent.

Tasks

TBD

DARPA-ASKEM / GoLLM