ga4gh / task-execution-schemas

Apache License 2.0
80 stars 28 forks source link

Proposal: make TaskParameter type field optional #76

Closed pditommaso closed 1 year ago

pditommaso commented 7 years ago

Currently the TaskParameter type is a required field. The caller needs to specify if the path is a file or directory both for input and output parameters.

However I have the impression this information add almost no value, because the server implementation needs in any case query the remote resource the apply the required operation (upload/download).

On the other hand, requiring this information as mandatory severely limits the ability of tools that dynamically discovers the type of output paths, such as Nextflow, to implement this specification.

My proposal is to make this field optional. When it is specified the semantic remains identical, and the server validates the inputs/outputs verifying the type matches. When the type is not specified the server just applies the operation to the path regardless if it is a file or directory.

buchanae commented 7 years ago

We've considered whether we should just remove this field. Previously there were other fields which made it necessary, but I believe they have been removed. If possible, I would prefer removing it instead of making it optional.

geoffjentry commented 7 years ago

Agree that I'd prefer to try to remove things vs. optionalizing them where possible.

adamstruck commented 7 years ago

+1 for removing the field.

pditommaso commented 7 years ago

I'm fine to remove it as TaskParameter. It may still be needed in the task view in relation to https://github.com/ga4gh/task-execution-schemas/issues/77#issuecomment-320192859 .

buchanae commented 6 years ago

As I understand, one argument for keeping this field is that it adds a level of type checking to the system: if you define the input/output as a Directory and the system encounters a File, the system should let the user know something unexpected occurred by returning an error.

A counterargument is that this type checking responsibility should be handled by workflow engines or other TES clients. In the case of Nextflow, this allows them to forgo type checking, which is a core feature of their system. Engines which want simple file/directory type checking would implement this by communicating with the storage layer directly.

Hope I captured the arguments accurately.

From a technical standpoint, we (the Funnel dev team) think it's possible to implement a system with a pre-defined input/output file type in Funnel, but we'd like the chance to get it fully implemented in order to discover any edge cases.

pditommaso commented 6 years ago

There's are three options here: 1) keep mandatory, 2) make it optional, 3) removed it.

I've explained in the original comment why having it as a mandatory field is a problem. I'm fine both with 2 and 3. Though I tend to think the best solution would be 2 (optional), because it would give the opportunity to enforce the type checking if needed/requested.

Also it could be useful in relation to #77 (eg. outputs all files/directories matching a glob pattern).

adamstruck commented 6 years ago

I am also fine with both options 2 and 3. I'd like to get some version of this and #77 into the spec soon.

geoffjentry commented 6 years ago

@erikvdbergh Is this something you all have an opinion on?

erikvdbergh commented 6 years ago

I would want to keep it around but making it optional is fine. I'm curious @pditommaso how NextFlow handles HTTP inputs when this is not defined? A lot of http endpoints don't allow directory listing so for us it is a quick way to throw it back with an error.

I understand that for FTP, S3 and other endpoints this is a moot field, so I'd agree with making it optional. Keeping it around would also prevent clobbering of directories with files if the output type is unclear or in case of some errors.

pditommaso commented 6 years ago

how NextFlow handles HTTP inputs when this is not defined?

Nextflow does not allow glob patter for HTTP input files, exactly for the reason you have mentioned. However the proposal for an optional GLOB file type is related to *output* files.

wleepang commented 4 years ago

I'm curious if there are any updates on this issue. Reading through the thread it looks like there is consensus on have the field be optional. What would be the appropriate next steps?

kellrott commented 3 years ago

The current proposal for v1.1 is to make the field optional, and then request the server fill it out during runtime once it accesses the actual elements.

uniqueg commented 2 years ago

Related PR #155