clamsproject / clams-python

CLAMS SDK for python
http://sdk.clams.ai/
Apache License 2.0
4 stars 1 forks source link

promoting `_appmetadata()` method to an independant file #117

Closed keighrim closed 1 year ago

keighrim commented 1 year ago

As we march toward the development of prototype of the app-directory (https://github.com/clamsproject/clams-apps/issues/15), it seems that we need a easier way to query for app metadata. In the linked issue, there were two suggestions on how a machine can read app metadata.

And @kelleyl and I talked about this last week, and we'd like to suggest a third way. But first, possible issues with the above suggestions;

  1. Metadata as a static file: this will provide probably the easiest way for a machine to read the metadata. The problem is that either app developers have to edit raw JSON/YAML files to update the metadata (as opposed to using python code in the SDK that already provides abstraction of JSON syntax and automatic validation) or use the python code to specify metadata and then manually export it to a JSON/YAML file (which can be automated, but not at the python SDK level). I dislike the latter case more, as one more manual step means more room for human error.
  2. Build-and-run a container and send GET request: obvious issue with this is the time and energy wasted to build the whole thing, which can include downloading GB of models or compiling underlying software like kaldi, just to get the (mostly) static content.

So the third way;

App development side

Developers can choose between metadata.json or metadata.py to write the app metadata.

  1. metadata.json (or yaml): this is identical to the first suggestion above. If a developer is confident in directly editing raw JSON file and makes sure the file is valid JSON as well as CLAMS app metadata, they should be able to choose this path and not worry about writing more python code as described in the below.
  2. metadata.py: we provide a skeleton for this file via the SDK documentation or the template code repository. This file MUST import only python builtins and clams-python library to minimize time and effort to build and run this file.

SDK development side

And then in the SDK, we remove _appmetadata() method and re-write annotate() so that

  1. if metadata.json is present, just load the metadata from the file.
  2. if metadata.py is present, import the module and call a function inside to load metadata.

App template

And, the metadata.py in the starter kit should look like;

from clams import AppMetadata
from mmif import DocumentTypes, AnnotationTypes
# developers MUST not change the above

def get_metadata(): # the name of this function must be consistent throughout all the apps compatible with the "app-directory"
    metadata = AppMetadata(
        # fill these
        name="",
        description="", 
        url="http://repository/of/source_code",
        app_license="",
        analyzer_version="",
        analyzer_license="",
        identifier="",
        )
    metadata.add_input(DocumentTypes.xxxDocument)
    metadata.add_output(AnnotationTypes.xxxAnnotation)
    return metadata

# developers MUST not change the below
if __name__ == '__main__':
    return get_metadata().json(exclude_defaults=True, by_alias=True)

app-directory side story

And finally, but not the least, app directory automation can

  1. when metadata.json is present, just read the file in
  2. when metadata.py is present, run pip install $(grep clams-python requirements.txt); python metadata.py > metadata.json and read in the json file

A few question to resolve to make this happen:

  1. Is the separate "template" repository the best place to put the metadata.py template? If we do so, then when there's an update in the metadata scheme, we need to update two different codebase. I think that's probably not the best practice. We can maybe package the whole template repository inside the SDK and add a CLI command (like clams new-app) to release a local copy of the template on demand.
  2. If we make the SDK responsible for some of metadata (e.g. #114), we need to decide where in the code the "injection" should occur.
    1. If the injection is done inside appmetadata() method, the output of metadata.py (and the app-directory) won't have those to-be-injected fields
    2. If the injection is done inside the metadata.py, it is no longer a part of SDK. In this case, if there's metadata.json instead of the metadata.py, the app-directory, again, won't have those to-be-injected fields.
    3. We can add a CLI command (like clams app-metadata) to collect information from the working directory (metadata.json, metadata.py, git log, $APP_VERSION, etc) and generate a complete metadata JSON.

Now I'm all ears. Let me know what you think.

keighrim commented 1 year ago

Regarding the question number 2 from the previous comment, implementation of app_version injection that's proposed in https://github.com/clamsproject/clams-python/pull/118 is done basically by setting a default value (default but dynamically generated) to the field before _appmetadata(), appmetadata(), or metadata.{jons,py}. So the question seems to have become irrelevant. However it may come back for a future development (addition, change) of a field in AppMetadata that needs automatic generation but cannot use a "default_factory"-like mechanism. (I can't imagine such an example for now though)

keighrim commented 1 year ago

Additionally, with some fields are being auto-generated (currently mmif_version, possibly app_version and identifier in the future depending on how #114 resolves), I actually don't think having static JSON/YAML files is a good option anymore. Conceptually, if there's metadata.py I would immediately think that there should be something dynamically generated, but when I see a JSON or YAML file, I'd take the file as static information and consider what in there is a complete set of necessary information. However, with auto-generating values, a static JSON/YAML file cannot be a complete, source-of-truth file, unless the maintainer takes extreme care of the content not to be outdated, which I would not expect from any human being.

keighrim commented 1 year ago

resolved via #128 .