Open keighrim opened 4 months ago
I'm in favor of this idea. Do you think it would make sense to also start enforcing/validating the directionality of alignments? Basically so that a subtype annotation can't be the source of a parent type annotation. E.g. an Alignment
may never have a Translation
document as the source
and TextDocument
as the target
. Only the other way around.
Another useful type may be Summary
@marcverhagen Under the proposal, text-to-text summaries are Translation
s and video-to-text or image-to-text summaries are Caption
s. The idea is that these subtypes are not purely based on "semantics" of the type, but also have a formal proxies (in terms of associated Alignment
) so that the downstream consumers are less coupled with semantic knowledge and can be more reactive to syntax of the MMIF, which should be much more robust.
@wricketts Yeah, I also like the idea of directed alignment. It's pretty clear for Translation
case. But for other two cases, it becomes a bit murky, since (note that for both, most of the time, source's going to be the image or audio) there are rare cases where text is the source (e.g., "read" corpus where the script is actually the source of the audio, or text-to-image generation models) However, since we only allow TextDocument
s to be annotation instances (as a second-order object compared to the objects in the top mmif.documents
list), I don't think we need to consider those rare cases for now.
In terms of implementation of the directionalities, we could add an attribute to the alignment type or we can super-/subclass a "undirected-alignment".
In terms of implementation of the directionalities, we could add an attribute to the alignment type or we can super-/subclass a "undirected-alignment".
I could see a subclass DirectedAlignment
that inherits from Alignment
(less constrained) having nice flexibility for validating source
and target
annotation at_types. But probably you or someone who's done extensive work on the SDK would have the best judgment on this.
I have been thinking of an alternative for the word Translation
because of its connotations. If I had read the original proposal above better I would have noticed that summary was intended to be a kind of Translation
, but it is somewhat unintuitive to me tat summary is a kind of translation. I was thinking Transformation
, but I don't like that either.
An alternative would be to not introduce new annotation types but a textType
property where the value can be "transcript" or "summary" or "caption", similar to how we have bounding boxes of type "text". There is something to be said to have specialized types where we can express relationships that pop up in multimodal processing, there is also something to be said to keep the number of types low.
Also, instead of alignments we can think of some of the relations between text documents and other types as proper relations and think (again) about how to structure the vocabulary part under Relation
.
In a related issue we discussed introducing types like CSV and CONNL in the hope that they would help with quickly rolling out a wrapped RFB app. We were thinking of those as MIME types though (incidentally, the vocabulary would have to change a bit since it only allows MIME types if you use the "location" property in stead of the "text" property).
Yeah, I also thought about having it as a property. One thing we can (possibly greatly) benefit from a separate (sub-)type is that we can syntactically distinguish TextDocument
in the source documents
list and (TextDocument)
in views' annotations
lists. But we can take the best of both worlds by adding just one type (DDerivedTextDocument
or something like that) as a new vocab member and put the subtype information as a property value.
However, another point to consider is that vocab types are actually defined behind URIs, while for property values, we don't have any effective way to regulate them (behind developers' consciousness), so we are likely to end up with dealing lots quirk-y naming and typos (we have already experienced enough problems from minor quirks like timeUnit=millisecond
and timeUnit=milliseconds
)
Notes from yesterday's in-person meeting;
@marcverhagen wants to keep the number of vocab items low, while seeking for a way to "control" the property values. He will work on drafting a new vocabulary yaml file format that provides
enumeration
sub-field for each property that provides a finite set of pre-defined possible values. enumeration
subfield, any property should be considered to be able to bear any "free" value.this new vocab file will be used in SDK to automatically generate some mechanism for validation of property values. @keighrim suggested revisiting https://github.com/clamsproject/mmif/issues/23 and https://github.com/clamsproject/mmif-python/issues/23 .
some additional notes from my thoughts:
textType
doesn't really sit well in me, in fact any xxxType
keys [^1] just seem very wrong to me given that all vocab items already have "type" (@type
) property, and we call them as vocabulary "types". To that end, I believe any value associated with "typing" of a type should just be an actual "type" instead of xxxType
property within a "type" textType
is origin
or origination
. enumeration
list must match the data type of the property, so I think we might need to revisit how we define data types of each property and add more type-theoretic formalism to our practice. This will somewhat address https://github.com/clamsproject/mmif/issues/215. [^1]: We had frameType
and boxType
in the past, but no longer (https://github.com/clamsproject/mmif/issues/218) . So there's actually no xxxType
property at the moment, to be clear.
The property key
textType
doesn't really sit well in me, in fact anyxxxType
keys just seem very wrong to me given that all vocab items already have "type" (@type
) property, and we call them as vocabulary "types". To that end, I believe any value associated with "typing" of a type should just be an actual "type" instead ofxxxType
property within a "type". My alternative suggestion fortextType
isorigin
ororigination
.
Yes, it is weird to have @type
and textType
. Origin is not a bad name, as long as we document exactly mean by that (everything has an origin, when do we use it). My main worry with introducing additional annotation types is potential bloating of the vocabulary, if th ebloat is very minimal then it should be okay. The notion of subtype or label does not bother me as an additional way of bringing in some kind of typing.
Also, as an additional concern is the status of the LAPPS vocabulary and the potential need to expand the CLAMS vocabulary (see https://github.com/clamsproject/mmif/issues/202).
We had
frameType
andboxType
in the past, but no longer (https://github.com/clamsproject/mmif/issues/218) . So there's actually noxxxType
property at the moment, to be clear.
They are still in the vocabulary at https://mmif.clams.ai/1.0.5/vocabulary/ so technically not quite correct. Current code in timeframe evaluation makes heavy use of frameType
. We did introduce label
and classification
to streamline this.
You are right in that the frameType
and boxType
are technically still in the vocab page. But in practice, they are there only because of historical reasons, and the latest version of them clearly indicate that using those keys are no longer recommended. SDK is aware of this "deprecation", and let you can query by either name. https://github.com/clamsproject/mmif-python/pull/262
One more thing I wanted to add to that PR 262 was automatic "canonicalization" of the key name xxxType
into label
at .add_property()
time, but that seemed to be too "magic" so I stopped there, but I'm still more that eager to do so, so that I don't see any xxxType
in future MMIFs.
Just to be clear, these xxxType
properties were never used for manual "typing" of an annotation object, to my knowledge and to the current archive of app metadata.
$ grep Type clamsproject/apps/docs/_apps/**/**/metadata.json
aapb-pua-kaldi-wrapper/v1/metadata.json: "description": "When true, the app looks for existing TimeFrame { \"frameType\": \"speech\" } annotations, and runs ASR only on those frames, instead of entire audio files.",
aapb-pua-kaldi-wrapper/v2/metadata.json: "description": "When true, the app looks for existing TimeFrame { \"frameType\": \"speech\" } annotations, and runs ASR only on those frames, instead of entire audio files.",
barsdetection/v1.0/metadata.json: "frameType": "bars"
barsdetection/v1.1/metadata.json: "frameType": "bars"
chyron-detection/v1.0/metadata.json: "frameType": "chyron"
east-textdetection/v1.0/metadata.json: "name": "frameType",
east-textdetection/v1.1/metadata.json: "name": "frameType",
east-textdetection/v1.2/metadata.json: "name": "frameType",
fewshotclassifier/v1.0/metadata.json: "frameType": "string"
fewshotclassifier/v1.0/metadata.json: "name": "finetunedFrameType",
gentle-forced-aligner-wrapper/v1.0/metadata.json: "frameType": "speech"
gentle-forced-aligner-wrapper/v1.0/metadata.json: "frameType": "speech",
parseqocr-wrapper/v1.0/metadata.json: "boxType": "text"
pyscenedetect-wrapper/v1/metadata.json: "frameType": "shot",
pyscenedetect-wrapper/v2/metadata.json: "frameType": "shot",
slatedetection/v1.0/metadata.json: "frameType": "string"
slatedetection/v1.1/metadata.json: "frameType": "string"
slatedetection/v1.2/metadata.json: "frameType": "string"
slatedetection/v2.0/metadata.json: "frameType": "slate"
slatedetection/v2.1/metadata.json: "frameType": "slate"
swt-detection/v3.0/metadata.json: "frameType": "bars"
swt-detection/v3.0/metadata.json: "frameType": "slate"
swt-detection/v3.0/metadata.json: "frameType": "chyron"
swt-detection/v3.0/metadata.json: "frameType": "credits"
tesseractocr-wrapper/v1.0/metadata.json: "boxType": "text"
tesseractocr-wrapper/v1.0/metadata.json: "name": "frameType",
tonedetection/v1.0/metadata.json: "frameType": "tone"
Instead they always have been used to capture algorithmic classification results (they are all now replaced with label
key). So I'm tempted to say that we define "typing" as a manual task to build the hierarchical categorization of the annotations, while "labeling" is an algorithmic classification task. (To that end, should we allow "muti-labels" in the labeling task?) That all being said, the property proposed in this thread is more close to "typing" in that the value is manually set by a human, and I'm still open to add them as separate vocab items, and I don't think that is a bloat, but a necessary "upgrade".
More thoughts on the implementation of "full validation" of annotation objects based on vocab yaml (or equivalent piece of information from spec).
New Feature Summary
With a number of recent development, I'd like to propose more vocab types that are subcategories of
TextDocument
(all names are tentative in the proposal)Transcript
: a subtype of text document, always aligned to annotations in non-linguistic modalities (audio, vision), and represent linguistic, and "literal" transcript of the source modality. (e.g. ASR, TR/OCR)Translation
/Transformation
/Extraction
: a subtype of text document, always aligned to anotherTextDocument
-type annotations. The content of this annotation must be a kind "re-writing" of the source text document. (e.g. identity function in text-slicer, structural parsing in RFB, summary in text-summarizer apps)Caption
: this is similar toTranscript
but the content is not "literal" transcript of the source modality (e.g. image-based captioning app, audio-based summarizer app handles non-linguistic sounds like dog barking)Related
The addition of subtypes of text document will ease the identification of "app patterns" without relying on a specific app name, and hence help generalize I/O specs for any downstream/consumer applications.
The issue of view pattern identification has been raised many times, including
Alternatives
No response
Additional context
Also see https://github.com/clamsproject/app-role-filler-binder-new/issues/4 for discussion on development of a prototype "app pattern".