elixir-cloud-aai / trs-filer

Lightweight, flexible Flask/Gunicorn-based GA4GH TRS implementation
Apache License 2.0
8 stars 6 forks source link

Posting versions should support associating files with descriptor types #58

Closed uniqueg closed 4 years ago

uniqueg commented 4 years ago

Description

Endpoints such as GET /tools/{id}/versions/{version_id}/{type}/descriptor require descriptor files to be associated with one of the descriptor types enumerated in the DescriptorType model:

    DescriptorType:
      type: string
      description: The type of descriptor that represents this version of the tool
        (e.g. CWL, WDL, NFL, or GALAXY). Note that these files can also include
        associated Docker/container files  and test parameters that further
        describe a version of a tool.
      enum:
        - CWL
        - WDL
        - NFL
        - GALAXY

Currently the FilesRegister model does require users to provide descriptor type info when providing PRIMARY_DESCRIPTOR and SECONDARY_DESCRIPTOR files:

    FilesRegister:
      type: object
      description: Properties and (a pointer to the) contents of a file.
      additionalProperties: false
      properties:
        toolFile:
          $ref: '#/components/schemas/ToolFileRegister'
        fileWrapper:
          $ref: '#/components/schemas/FileWrapperRegister

with the FileWrapperRegister schema being defined as:

    FileWrapperRegister:
      type: object
      description: >
        A file provides content for one of

        - A tool descriptor is a metadata document that describes one or more tools.

        - A tool document that describes how to test with one or more sample test

        JSON.

        - A containerfile is a document that describes how to build a particular

        container image. Examples include Dockerfiles for creating Docker images

        and Singularity recipes for Singularity images
      additionalProperties: false
      properties:
        content:
          type: string
          description: The content of the file itself. One of url or content is required.
        checksum:
          type: array
          items:
            $ref: "#/components/schemas/ChecksumRegister"
          description: "A production (immutable) tool version is required to have a
            hashcode. Not required otherwise, but might be useful to detect
            changes. "
          example:
            - checksum: ea2a5db69bd20a42976838790bc29294df3af02b
              type: sha1
        url:
          type: string
          description: Optional url to the underlying content, should include version
            information, and can include a git hash.  Note that this URL should
            resolve to the raw unwrapped content that would otherwise be
            available in content. One of url or content is required

and the ToolFileRegister schema being defined as:

    ToolFileRegister:
      type: object
      additionalProperties: false
      properties:
        path:
          type: string
          description: Relative path of the file.  A descriptor's path can be used with
            the GA4GH .../{type}/descriptor/{relative_path} endpoint.
        file_type:
          type: string
          enum:
            - TEST_FILE
            - PRIMARY_DESCRIPTOR
            - SECONDARY_DESCRIPTOR
            - CONTAINERFILE
            - OTHER

File information is now being stored in a files MongoDB collection as:

{
    "id": "tool_id",
    "versions": [
        {
            "id": "version_id",
            "files": [
                {
                    "fileWrapper": {},
                    "toolFile": {}
                }
            ]
        }
    ]
}

i.e., the contents of FilesRegister are simply appended to an array of files nested under a tool version that is itself nested under a tool (with the tool ID being an indexed key for that collection).

As such, information on what type of descriptor a given descriptor file represents (CWL, WDL or other workflow), is not available, blocking implementation of several endpoints.

Proposed solution

The FilesRegister model should be extended to include descriptor type information, e.g., like so:

    FilesRegister:
      type: object
      description: Properties and (a pointer to the) contents of a file.
      additionalProperties: false
      properties:
        tool_file:
          $ref: '#/components/schemas/ToolFileRegister'
        file_wrapper:
          $ref: '#/components/schemas/FileWrapperRegister'
        descriptor_type:
          description: Type of descriptor (e.g., CWL, WDL) the file is
            associated with. Required if 'tool_file.file_type' is either
            'PRIMARY_DESCRIPTOR' or 'SECONDARY_DESCRIPTOR'. 
          $ref: '#/components/schemas/DescriptorType'

However, as it may be valuable to record type information for non-descriptor files as well (e.g., for image types, see https://github.com/ga4gh/tool-registry-service-schemas/issues/155), a more generic approach that may possibly already account for future changes to TRS without requiring breaking changes in the TRS-Filer POST and PUT endpoints might be preferable:

    FilesRegister:
      type: object
      description: Properties and (a pointer to the) contents of a file.
      additionalProperties: false
      properties:
        tool_file:
          $ref: '#/components/schemas/ToolFileRegister'
        file_wrapper:
          $ref: '#/components/schemas/FileWrapperRegister'
        type:
          description: Type of file. For descriptor files (`PRIMARY_DESCRIPTOR`
            and `SECONDARY_DESCRIPTOR`), the allowed file types are
            enumerated in the `DescriptorType` schema. For container recipe files
            (`CONTAINERFILE`), the allowed file types are enumerated in the
            `ImageType` schema. For these files, providing this property is required.
            For test files (`TEST_FILE`) and other files (`OTHER`), only `JSON` and
            `OTHER` are allowed as values for this property. For these files,
            providing the value can be omitted and will be set automatically by the
            implementation.
          anyOf:
            - $ref: '#/components/schemas/DescriptorType'
            - $ref: '#/components/schemas/ImageType'
            - type: string
              enum:
                - OTHER

Then, to make this information available to endpoints that require it, the files database schema should be adjusted accordingly, e.g, to:

{
    "id": "tool_id",
    "versions": [
        {
            "id": "version_id",
            "descriptors": [
                {
                    "fileWrapper": {
                        "content": "content",
                        "checksum": [
                            {
                                "checksum": "checksum",
                                "type": "sha-256"
                            }
                        ],
                        "url": "url"
                    },
                    "toolFile": {
                        "path": "path",
                        "file_type": "PRIMARY_DESCRIPTOR"
                    },
                    "type": "CWL"
                },
                {
                    "fileWrapper": {
                        "content": "content",
                        "checksum": [
                            {
                                "checksum": "checksum",
                                "type": "sha-256"
                            }
                        ],
                        "url": "url"
                    },
                    "toolFile": {
                        "path": "path",
                        "file_type": "SECONDARY_DESCRIPTOR"
                    },
                    "type": "CWL"
                }
            ],
            "containers" : [
                {
                    "fileWrapper": {
                        "content": "content",
                        "checksum": [
                            {
                                "checksum": "checksum",
                                "type": "sha-256"
                            }
                        ],
                        "url": "url"
                    },
                    "toolFile": {
                        "path": "path",
                        "file_type": "CONTAINERFILE"
                    },
                    "type": "DOCKER"
                }
            ],
            "tests": [
                {
                    "fileWrapper": {
                        "content": "content",
                        "checksum": [
                            {
                                "checksum": "checksum",
                                "type": "sha-256"
                            }
                        ],
                        "url": "url"
                    },
                    "toolFile": {
                        "path": "path",
                        "file_type": "TEST_FILE"
                    },
                    "type": "JSON"
                }
            ],
            "others": [
                {
                    "fileWrapper": {
                        "content": "content",
                        "checksum": [
                            {
                                "checksum": "checksum",
                                "type": "sha-256"
                            }
                        ],
                        "url": "url"
                    },
                    "toolFile": {
                        "path": "path",
                        "file_type": "OTHER"
                    },
                    "type": "OTHER"
                }
            ]
        }
    ]
}

Note that to arrange the files when a tool version is created/updated, the values provided for FilesRegister.type need to be validated by the implementation, taking into account the values provided for ToolFileRegister.file_type according to the following rules:

Other considerations

To do

To summarize, the following need to be done:

Part 1

Part 2