Endpoints such as GET /tools/{id}/versions/{version_id}/{type}/descriptor require descriptor files to be associated with one of the descriptor types enumerated in the DescriptorType model:
DescriptorType:
type: string
description: The type of descriptor that represents this version of the tool
(e.g. CWL, WDL, NFL, or GALAXY). Note that these files can also include
associated Docker/container files and test parameters that further
describe a version of a tool.
enum:
- CWL
- WDL
- NFL
- GALAXY
Currently the FilesRegister model does require users to provide descriptor type info when providing PRIMARY_DESCRIPTOR and SECONDARY_DESCRIPTOR files:
FilesRegister:
type: object
description: Properties and (a pointer to the) contents of a file.
additionalProperties: false
properties:
toolFile:
$ref: '#/components/schemas/ToolFileRegister'
fileWrapper:
$ref: '#/components/schemas/FileWrapperRegister
with the FileWrapperRegister schema being defined as:
FileWrapperRegister:
type: object
description: >
A file provides content for one of
- A tool descriptor is a metadata document that describes one or more tools.
- A tool document that describes how to test with one or more sample test
JSON.
- A containerfile is a document that describes how to build a particular
container image. Examples include Dockerfiles for creating Docker images
and Singularity recipes for Singularity images
additionalProperties: false
properties:
content:
type: string
description: The content of the file itself. One of url or content is required.
checksum:
type: array
items:
$ref: "#/components/schemas/ChecksumRegister"
description: "A production (immutable) tool version is required to have a
hashcode. Not required otherwise, but might be useful to detect
changes. "
example:
- checksum: ea2a5db69bd20a42976838790bc29294df3af02b
type: sha1
url:
type: string
description: Optional url to the underlying content, should include version
information, and can include a git hash. Note that this URL should
resolve to the raw unwrapped content that would otherwise be
available in content. One of url or content is required
and the ToolFileRegister schema being defined as:
ToolFileRegister:
type: object
additionalProperties: false
properties:
path:
type: string
description: Relative path of the file. A descriptor's path can be used with
the GA4GH .../{type}/descriptor/{relative_path} endpoint.
file_type:
type: string
enum:
- TEST_FILE
- PRIMARY_DESCRIPTOR
- SECONDARY_DESCRIPTOR
- CONTAINERFILE
- OTHER
File information is now being stored in a files MongoDB collection as:
i.e., the contents of FilesRegister are simply appended to an array of files nested under a tool version that is itself nested under a tool (with the tool ID being an indexed key for that collection).
As such, information on what type of descriptor a given descriptor file represents (CWL, WDL or other workflow), is not available, blocking implementation of several endpoints.
Proposed solution
The FilesRegister model should be extended to include descriptor type information, e.g., like so:
FilesRegister:
type: object
description: Properties and (a pointer to the) contents of a file.
additionalProperties: false
properties:
tool_file:
$ref: '#/components/schemas/ToolFileRegister'
file_wrapper:
$ref: '#/components/schemas/FileWrapperRegister'
descriptor_type:
description: Type of descriptor (e.g., CWL, WDL) the file is
associated with. Required if 'tool_file.file_type' is either
'PRIMARY_DESCRIPTOR' or 'SECONDARY_DESCRIPTOR'.
$ref: '#/components/schemas/DescriptorType'
However, as it may be valuable to record type information for non-descriptor files as well (e.g., for image types, see https://github.com/ga4gh/tool-registry-service-schemas/issues/155), a more generic approach that may possibly already account for future changes to TRS without requiring breaking changes in the TRS-Filer POST and PUT endpoints might be preferable:
FilesRegister:
type: object
description: Properties and (a pointer to the) contents of a file.
additionalProperties: false
properties:
tool_file:
$ref: '#/components/schemas/ToolFileRegister'
file_wrapper:
$ref: '#/components/schemas/FileWrapperRegister'
type:
description: Type of file. For descriptor files (`PRIMARY_DESCRIPTOR`
and `SECONDARY_DESCRIPTOR`), the allowed file types are
enumerated in the `DescriptorType` schema. For container recipe files
(`CONTAINERFILE`), the allowed file types are enumerated in the
`ImageType` schema. For these files, providing this property is required.
For test files (`TEST_FILE`) and other files (`OTHER`), only `JSON` and
`OTHER` are allowed as values for this property. For these files,
providing the value can be omitted and will be set automatically by the
implementation.
anyOf:
- $ref: '#/components/schemas/DescriptorType'
- $ref: '#/components/schemas/ImageType'
- type: string
enum:
- OTHER
Then, to make this information available to endpoints that require it, the files database schema should be adjusted accordingly, e.g, to:
Note that to arrange the files when a tool version is created/updated, the values provided for FilesRegister.type need to be validated by the implementation, taking into account the values provided for ToolFileRegister.file_type according to the following rules:
A ToolFileRegister.file_type should always be set. If not provided by the user, it should default to OTHER. Document this behavior in ToolFileRegister.description, validate and test it.
Allowed values of FilesRegister.type provided for a file of ToolFileRegister.file_type as indicated in the list below are:
PRIMARY_DESCRIPTOR: enumerated in DescriptorType schema; value for FilesRegister.typerequired
SECONDARY_DESCRIPTOR: enumerated in DescriptorType schema; value required
CONTAINERFILE: enumerated in ImageType schema; value required
TEST_FILE: only JSON, as enumerated under FilesRegister.type; value defaults to JSON if not provided
OTHER: only OTHER, as enumerated under FilesRegister.type; value defaults to OTHER if not provided
Other considerations
Controller implementations and tests will need to be adapted to account for the new FilesRegister and database schemas. In particular, this will affect the controllers ToolVersionRegister() and toolsIdVersionsVersionIdContainerfileGet(), and possibly others as well.
Note that the toolFile and fileWrapper properties in the proposed updated FilesRegister model are renamed to tool_file and file_wrapper for consistency. Update controller implementations and tests accordingly.
To do
To summarize, the following need to be done:
Part 1
[ ] replace FilesRegister model and rename any mentions of properties of toolFile and fileWrapper to tool_file and file_wrapper, respectively
[ ] re-implement RegisterToolVersion controller based on the new input schema and the proposed database schema
[ ] adapt tests for RegisterToolVersion controller and dependent endpoints
[ ] adapt tests for toolsIdVersionsVersionIdContainerfileGet controller
Part 2
[ ] implement type validations as described above (e.g., PRIMARY_DESCRIPTOR files are required to have a FilesRegister.type value, and that value needs to be enumerated in the DescriptorType schema)
Description
Endpoints such as
GET /tools/{id}/versions/{version_id}/{type}/descriptor
require descriptor files to be associated with one of the descriptor types enumerated in theDescriptorType
model:Currently the
FilesRegister
model does require users to provide descriptor type info when providingPRIMARY_DESCRIPTOR
andSECONDARY_DESCRIPTOR
files:with the
FileWrapperRegister
schema being defined as:and the
ToolFileRegister
schema being defined as:File information is now being stored in a
files
MongoDB collection as:i.e., the contents of
FilesRegister
are simply appended to an array of files nested under a tool version that is itself nested under a tool (with the tool ID being an indexed key for that collection).As such, information on what type of descriptor a given descriptor file represents (CWL, WDL or other workflow), is not available, blocking implementation of several endpoints.
Proposed solution
The
FilesRegister
model should be extended to include descriptor type information, e.g., like so:However, as it may be valuable to record type information for non-descriptor files as well (e.g., for image types, see https://github.com/ga4gh/tool-registry-service-schemas/issues/155), a more generic approach that may possibly already account for future changes to TRS without requiring breaking changes in the TRS-Filer
POST
andPUT
endpoints might be preferable:Then, to make this information available to endpoints that require it, the
files
database schema should be adjusted accordingly, e.g, to:Note that to arrange the files when a tool version is created/updated, the values provided for
FilesRegister.type
need to be validated by the implementation, taking into account the values provided forToolFileRegister.file_type
according to the following rules:ToolFileRegister.file_type
should always be set. If not provided by the user, it should default toOTHER
. Document this behavior inToolFileRegister.description
, validate and test it.FilesRegister.type
provided for a file ofToolFileRegister.file_type
as indicated in the list below are:PRIMARY_DESCRIPTOR
: enumerated inDescriptorType
schema; value forFilesRegister.type
requiredSECONDARY_DESCRIPTOR
: enumerated inDescriptorType
schema; value requiredCONTAINERFILE
: enumerated inImageType
schema; value requiredTEST_FILE
: onlyJSON
, as enumerated underFilesRegister.type
; value defaults toJSON
if not providedOTHER
: onlyOTHER
, as enumerated underFilesRegister.type
; value defaults toOTHER
if not providedOther considerations
FilesRegister
and database schemas. In particular, this will affect the controllersToolVersionRegister()
andtoolsIdVersionsVersionIdContainerfileGet()
, and possibly others as well.toolFile
andfileWrapper
properties in the proposed updatedFilesRegister
model are renamed totool_file
andfile_wrapper
for consistency. Update controller implementations and tests accordingly.To do
To summarize, the following need to be done:
Part 1
FilesRegister
model and rename any mentions of properties oftoolFile
andfileWrapper
totool_file
andfile_wrapper
, respectivelyRegisterToolVersion
controller based on the new input schema and the proposed database schemaRegisterToolVersion
controller and dependent endpointstoolsIdVersionsVersionIdContainerfileGet
controllertoolsIdVersionsVersionIdContainerfileGet
controllerPart 2
PRIMARY_DESCRIPTOR
files are required to have aFilesRegister.type
value, and that value needs to be enumerated in theDescriptorType
schema)