Closed keighrim closed 3 months ago
Related to https://github.com/clamsproject/app-swt-detection/issues/41, I had a brief discussion with @marcverhagen , and we need to decide what is the format of the gold files for SR annotations. Concretely, first thing to decide is whether the gold is time (interval)-based or image-based, or both.
In case we want to keep two representations in the gold format, we've been using csv files with start
, end
columns in other SR-like past projects (slates, chyrons), and I can't think of an easy way to keep the csv format (for reusing other eval.py files) and, at the same time, to store image-level annotation in that csv format as additional columns. And this repo is designed to allow only one format for golds, so we might need to reconsider that decision as well, if we can't find a way to use a single format to hold two different levels of representation and have to generate two formats.
Given the way we restructured the SWT app to keep image annotations (TimePoint
annotations), I think we can only keep image-based "gold" set fot SR project.
So the output format can be a csv for each cpb-....
ID,
# cpb-xxx-yyyyy
timepoint,label
t1,B
t2,SH
...
For all the "seen" timepoints in the raw data.
At the second look, since the "raw" portion of the annotation data is already organized by the GUIDs, we probably don't need to introduce a new format for gold, and instead can just copy raw files into gold dir.
Looking at the files third time, it looks like we can actually benefit from altering the columns a bit. Specifically, given this "raw" format
filename seen type label subtype label modifier transcript note
cpb-aacip-0acac5e9db7_01824989_00000000.jpg true B false
...
cpb-aacip-0acac5e9db7_01824989_00082015.jpg true S H false
...
using the new gold column naming convention.