clamsproject / aapb-evaluations

Collection of evaluation codebases
Apache License 2.0
0 stars 1 forks source link

Update words/terminology for "preds" and "golds" #40

Closed jarumihooi closed 4 months ago

jarumihooi commented 11 months ago

Because

As a subtask of #37, terms/words that refer to the same things should use the same words in the CLAMS projects. The terms will be defined in the eval-repo README.md. For future code, it is highly recommended to use only the terms "golds" and "preds".

It is suggested to make the code more readable if the terms become consistent. However, on the other hand, some code may require a good use of multiple variants of the term. Judgement should be used in those cases. It is noted that these terms are not obvious as they may seem: A meeting with someone familiar with the project had them ask the question of what "preds" were. If a different term is used because it is more appropriate for the context or because it is not the exact same thing, then that is an appropriate use of a different term.

It is not currently slated as a priority to fix occurances below, and readability is not as concerning until new contributors are required to interact with the code here. The different uses are just documented below to show the diversity of term choice.

==== Within the aapb-eval, these terms are used interchangably for the seemingly the same things: golds - refs
preds/predictions - results, hyps 1, 2, 3, test, test output, sys-file, test files
results - this should mean output evaluation results from now on.

(Note, due to low priority, only 1st 4 projects: asr, fa, nel, ner were reviewed.)

==== "reference" and "hypothesis" are used in the code here for the evaluation of the time frames from which are drawn from golds data and preds data. This is a case likely where it may be unneccesary to consider changes.

Done when

Additional context

Changes made for this issue should be a refactor and have no impact on the running of the code.

keighrim commented 4 months ago

59 fixes this issue by iterating common synonyms of "preds" and "golds" in the README, instead of modifying evaluate.py files.