coli-saar / alto

Alto, the Algebraic Language Toolkit
Other
16 stars 2 forks source link

Lazy instance parsing in Corpus #80

Open alexanderkoller opened 2 years ago

alexanderkoller commented 2 years ago

Currently, an Instance is a bundle of algebra values. These values are provided either by parsing a line of the corpus in Corpus#readCorpusWrapper or by direct construction, e.g. in the CorpusConverter and its uses.

This is limiting. It means that a corpus can only contain interpretations whose algebra implements Algebra#parseString, which is not necessarily true if the interpretation is intended for outputs only. One could easily implement an "exact match" evaluation in the ParsingEvaluator script which parses some input interpretations into an output interpretation and then checks for string equality.

We should change the Corpus class to lazy parsing of instances: They are stored as strings until someone requests an algebra value for that specific interpretation, at which point the string is parsed into an algebra value and cached. Here are some thoughts on the ramifications.