clamsproject / aapb-annotations

Repository to store manual annotation dataset developed for CLAMS-AAPB collaboration
3 stars 0 forks source link

Reformatting of Gold Data and process.pys #56

Closed jarumihooi closed 6 months ago

jarumihooi commented 9 months ago

Because

there is now a new template and conventions for the whole dataset repository, we now have discrepancies between the requested gold conventions as requested for the understandability of the data. Thus, all 5 currently existing projects may need to be redone by:

  1. redoing the process.py.
  2. regenerating the golds datasets.
  3. Informing needed downstream stakeholders, (e.g. MMIF? tool evaluators? for tool ingestion)

When redoing, these are the two major conventions that are to be conformed to:
A. Time format - should be displayed/stored as ISO Time format. This can be achieved a few ways, it can be saved as hh:mm:ss.mmm or saved as two integers of seconds and milliseconds to be reconverted into a more readable hh:mm:ss.mmm.
B. Column Headers/Fields - of the golds data should use conventional names, such as start and end instead of start_time and end_time. The chyrons gold data is an example where this is needed. Other column fields should be investigated for similarities.

Done when

Additional context

No response