google-research / lm-extraction-benchmark

Apache License 2.0
271 stars 19 forks source link

Criteria for evaluation #13

Closed ugorsahin closed 1 year ago

ugorsahin commented 1 year ago

How do you plan to evaluate a given prompt? Is it only considered binary True/False depending on correctness of all output, will there be a metric which calculates closeness to the actual suffix? There may be some scenarios in which only one or two tokens are mismatch and they do not create substantial meaning shift, how do you plan to evaluate that?

carlini commented 1 year ago

For now, we measure by scoring "1" if the exact prompt matches identically and "0" otherwise. We understand this is not ideal (and have actually written a recent paper making this case) but any threshold will ultimately be arbitrary and so setting to verbatim memorization is at least the most consistent with the literature.