Closed keighrim closed 7 months ago
I don't think I appreciate the technical difficulties so I won't speak to those.
As an app developer, it would be very helpful to be able to add something like
metadata.add_output(SOME_TYPE, description='some description', ...)
to metadata.py
and that the user of the app can ping the metadata and see something like
{
"@type": "http://mmif.clams.ai/vocabulary/SOME_TYPE/v3",
"properties": { },
"description": "some description",
},
This would not need to be in the MMIF so it should not impact the JSON schema.
How to implement this is another matter.
What I originally wrote is about the difficulty of implementation of allowing
metadata.add_output(SOME_TYPE, description='some description', ...)
and then treating description
to be a "special" argument that renders as you depicted since a user of a vocab type (an app developer) can use description
as a property if we don't reserve it.
Also, I actually do think this needs to be in the output MMIF. (but not impacting MMIF schema)
However if the information is recorded only in the app metadata, we will eventually have a problem similar to #208. So we should consider duplicating the information in the output MMIF as well so that the output file is self-contained to a certain extent.
The implementation in #226 (which is simply adding description
metadata to the very top Thing
type) will allow
metadata.add_output(SOME_TYPE, description='some description', prop1='value1')
that renders into
{
"@type": "http://mmif.clams.ai/vocabulary/SOME_TYPE/v3",
"properties": {
"description": "some description",
"prop1": "value1"
}
},
without worries about field name conflicts - we basically name-squat on description
name.
# metadata.py
SOME_TYPE_OUTPUT_DESCRIPTION = 'some description'
...
metadata.add_output(SOME_TYPE, description=SOME_TYPE_OUTPUT_DESCRIPTION, prop1='value1')
...
# app.py
def _annotate(self, in_mmif, *params):
my_view = in_mmif.new_view()
self.sign_view(my_view, params)
my_view.new_contain(SOME_TYPE, description=metadata.SOME_TYPE_OUTPUT_DESCRIPTION)
...
(Of course, it would be even better if we can somewhat automate new_contain
part. But, that will be a separate issue for clams-python
repo.)
to get
# app metadata
{
...
"output": [ {
"@type": "http://mmif.clams.ai/vocabulary/SOME_TYPE/v3",
"properties": {
"description": "some description",
"prop1": "value1"
}
} ]
...
}
and
# output mmif
{
...
"views": [ {
"id": "some_id",
"metadata": {
...
"contains": [ {
"http://mmif.clams.ai/vocabulary/SOME_TYPE/v3": {
"description": "some description",
"prop1": "value1"
}
} ]
}
} ]
...
}
This is more complicated than I thought. I think the core issue is what the status is of a comment added by the app developer. Subconsciously and in retrospect, what I assume is the following:
Because of 1 and 2 above, I do not see why the comment/description has to be in the MMIF output of a POST request. The description is an app-level piece of information, just like the license. It does apply to just one parameter, but no matter what particular value for the parameter the user enters, the description is always the same and can be accessed via the GET request. So for me, issue #208 does not apply. Not surprisingly, I also do not see the need to include the description in the new_contain method.
And because of 3 and 4, I lean towards not wanting to adjust the vocabulary, which specifies how we interpret the types.
However, assumption 4 still leaves open the possibility that sometimes we just may want to do that. It might be nice to allow app-specific extensions of the vocabulary that add (or even overwrite) information from the standard CLAMS vocabulary. In our LAPPS past, we always felt that if someone has a different interpretation of the type then they should come up with a new one. As far as I know CLAMS does not have an explicit opinion on this.
Note that a downstream app cannot meaningfully refer to this interpretation because is people-readable only, it is also unlikely that we add any interpretation to have it in https://mmif.clams.ai/vocabulary/. Maybe it is better to view this not as changes to the vocabulary, but as something else entirely. There is something like a initial proposal (not well thought out yet) below.
Other notes:
I don't see a problem with using description in the method that adds an output to the metadata and as a json property in the appropriate output element that results from the GET request to the app. They ultimately refer to the same piece of information so having the same name sounds appropriate. Also, if using "description" in the add_output method and the JSON metadata output is name-squatting, then the same is true for any other property there.
and then treating
description
to be a "special" argument that renders as you depicted since a user of a vocab type (an app developer) can usedescription
as a property if we don't reserve it.
In my little world the description is not a property like the others but an added note just intended for users so I kept it out of the properties list (and out of the vocabulary):
{
"@type": "http://mmif.clams.ai/vocabulary/SOME_TYPE/v3",
"properties": { },
"description": "some description",
}
But there sure is a problem with how to use add_output for this and indeed the need for a reserved property unless we make changes to add_output.
I see two options. The first is to change the signature of add_output:
metadata.add_output(
AnnotationTypes.TimeFrame,
description="SOME_TEXT",
properties=Properties(timeUnit='milliseconds')
The other is to add an extra method:
metadata.add_comment("SOME_TEXT")
One last remark on the change to the vocabulary
metadata:
description:
type: String
description: >-
A human-readable description of the object. This is intended to be used
for documentation purposes for particular use cases of this annotation
type and is not expected to be used by software.
Even if we add this, maybe it should not be metadata because in a sense it is not at all different from the regular description. Also, probably "for particular use cases" should be "for a particular use case".
- It IS part of the metadata for an app
True.
- It is NOT a parameter handed in for a particular invocation of the app (the comment is always the same)
True.
- It is NOT necessarily associated with one input or output (although all use-cases so far do limit to one input/output)
- It is NOT necessarily part of the metadata of an annotation type (corollary of the previous one)
No, this issue is specifically about adding comments (or "extensions") to individual types. For cross-type comments, the top-level description
is available. I thought I made this clear in the issue body, but please let me re-iterate myself.
... place to put documentation on how different annotation types are used and interpreted (or should be interpreted) is the
description
field in the app metadata top-level ...
the description is always the same and can be accessed via the GET request.
So are default values of runtime parameters. Yet our primary user (@owencking) asked for re-recording that duplicate information for readability of the output MMIFs. I'm considering this not as some critical data passing machinery, but as a convenient UX feature for readability.
Note that a downstream app cannot meaningfully refer to this interpretation because is people-readable only, it is also unlikely that we add any interpretation to have it in https://mmif.clams.ai/vocabulary/.
Yes, that is my intention and it's explicitly said in the #226.
In my little world the description is not a property like the others but an added note just intended for users so I kept it out of the properties list (and out of the vocabulary):
Totally agree that the description
(or any sort of "extension" statements) is not a regular property. Admittedly, my proposal of using description
as a type property was a hack stretch to solve two problems - type-level commenting + re-recording in the output MMIF without changing MMIF jsonschema - with one stone. Well, if the latter is not really a problem, changing AppMetadata jsonscheme is much easier than changing MMIF schema. But considering the "possibility" of type extension (In my opinion, I don't think that is a possibility but a real problem happening as of now with the new docTR app pending release, though), keeping this out of property list will eventually force us to change the MMIF schema (and at least minor-level bump). Which is still fine - we can't live with MMIF 1.0.x forever anyway.
Regarding actual implementation, changing the signature of add_input
and add_output
will break the API, so all old apps will need to be partly re-written when using a new SDK version. which is not a desired situation. That said, adding extra method looks fine. As long as developers are actually reading the API documentation (via website or via a proper IDE) that should work.
Notes from an in-person meeting today;
parameters
v. appConfiguration
" issue (#208) in that the final information in the appConfiguration
field is not merely repetition of some part of app metadata but it's the results of the refinement. properties
map. clams-python
repo side. Closing issue as "won't fix".
New Feature Summary
As an app developer, I sometimes want to add some additional "comments" for input or output annotation types for human users of the app. For example, the new docTR OCR app is re-purposing old
Token
,Sentence
, andParagraph
annotation types to represent work boxes, line boxes, block boxes respectively from OCR results, but at the moment, the only place to put documentation on how different annotation types are used and interpreted (or should be interpreted) is thedescription
field in the app metadata top-level (which is quite lacking now @snewman-aa ). To me, this seems not ideal, because the top-leveldescription
field is just an unstructured text field, so expressiveness is somewhat limited.Proposed here is a type-specific
description
field that developers can use for the documentation purposes as discussed above. This should basically be used only for human reader, thus machine/software must NOT rely on the information from this field, and be completely optional.Related
No response
Alternatives
I started thinking about this as new feature to
clams-python
SDK's app metadata package, specifically underadd_input
andadd_output
(mostly for output side) methods.However, there were some problems.
description
field toAppMetadata.Output
(this and this objects), butdescription
itself is a reserved keyword in jsonscheme syntax, so adding this new key can add some confusion in reading the file. I talked about this with @@marcverhagen in person, and we explored other options ($comment
, and others) but all were equally unsatisfying. To be fair, this is actually very trivial problem, and we are usingdescription
field in the top-level of the app metadata scheme already anyway. Howevermore importantly, if an app developer wants to use
description
key in an annotation type, havingdescription
for the metadata only will breakadd_input
andadd_output
methods. For example,then if
description
is used in the**properties
, it will conflict with the metadatadescription
field. To handle this conflicts, I thought about usingdescription_
(added underscore) to prevent them, but this is not a good solution since it is not consistent with the rest of the codebase. Sodescription
on the top levelThing
type as a proper (but optional) property can mitigate this conflict by simply "reserving" the name. And thenmetadata.add_output(SOME_TYPE, description='some description' , moreprop='more_value')
inmetadata.py
WILL NOT automatically add magical lines toapp.py
to sign the views with that description, so it should be done manually. So there's still a big chance that the same information is missing either of the places even after this proposal is implemented.Additional context
No response