The newshour-chyron project'sprocess.py files from https://github.com/clamsproject/aapb-annotations/commit/190987973246f8b043576b7449624720b24f7e85 is generating the gold file as a single csv file, but since the commit we change how we structure this repository, hence the script needs to be completely re-done.
Specifically, as stated in the repository README file, we want one file per one media in the gold data. Namely, process.py needs to read all the tabular files from in the YYMMDD-batchname directories (currently there's only one, namely annotations/220701-batch2) and generate one file per GUID.
For data format of the future gold files is a subject to discuss.
Done when
[ ] The format of gold files are determined.
[ ] process.py is updated
[ ] new files are generated in the golds directory, replacing the current one.
Because
The newshour-chyron project's
process.py
files from https://github.com/clamsproject/aapb-annotations/commit/190987973246f8b043576b7449624720b24f7e85 is generating the gold file as a single csv file, but since the commit we change how we structure this repository, hence the script needs to be completely re-done. Specifically, as stated in the repository README file, we want one file per one media in the gold data. Namely,process.py
needs to read all the tabular files from in theYYMMDD-batchname
directories (currently there's only one, namelyannotations/220701-batch2
) and generate one file per GUID.For data format of the future gold files is a subject to discuss.
Done when
golds
directory, replacing the current one.Additional context
No response