MAAP-Project / Community

Issue for MAAP (Zenhub)
2 stars 1 forks source link

Supply absolute download paths of file inputs as positional arguments to algorithm run script #1138

Open chuckwondo opened 1 week ago

chuckwondo commented 1 week ago

Is your feature request related to a problem? Please describe.

When an algorithm defines more than 1 file input type, there is no convenient means to determine which file automatically downloaded to the input directory corresponds to which file input. This is because a user can supply any URL as a file input, and thus the name of the downloaded file is not known in advance. Thus, when more than 1 file is placed in the input directory, there's no way to know which file is for which file input parameter.

Describe the solution you'd like

For each file input, automatically provide the full path of the downloaded file as an additional positional argument to the algorithm's run command.

Further, to make this a non-breaking change, add such arguments to the end of the values supplied for positional arguments, and in the same order that the file inputs are defined.

Effectively, handle file inputs as if they were defined as positional inputs after the explicitly defined positional inputs.

For example, assume an algorithm defines 2 file inputs and 3 positional inputs, in the following order (although the file and positional sections could be reversed without any impact on the result), excluding details:

inputs:
  file:
    - name: url1
    - name: url2
  positional:
    - name: pos1
    - name: pos2
    - name: pos3

Currently, DPS will call the run command defined in the algo config file with 3 positional arguments, one for each of the positional inputs defined above, like so:

RUN_SCRIPT 'pos1 value' 'pos2 value' 'pos3 value'

This leaves the run script with having to figure out which of the 2 files downloaded for the 2 file inputs correspond to witch of the inputs because the run script has no way of knowing in advance what URLs the user supplied as inputs, and thus know way of knowing the corresponding filenames in the input directory.

I propose that DPS simply tacks on the absolute paths of the downloaded files as additional arguments to the run script, like so:

RUN_SCRIPT 'pos1 value' 'pos2 value' 'pos3 value' 'abs path1' 'abs path2'

where 'abs path1' and 'abs path2' are the absolute paths of the files downloaded for the url1 and url2 file inputs, respectively.

(In conjunction, allow file inputs to be optional, just like positional inputs can be. I believe that currently, you must supply a value URL for every file input, but a file input should be allowed to be empty, just like a positional one can be.)

This way, the run script can look for the absolute paths of file inputs at the end of the other positional inputs.

By placing them after the "true" positional inputs, existing scripts will not break, because they will still see the positional inputs in the same positions.

Describe alternatives you've considered

The alternative is to define duplicate inputs for each file input: one is the original file input, and the duplicate is a positional input that provides the name of the downloaded file. This is very annoying and error-prone, and can also be very confusing. This not only requires redundant input entries in the algo config, but also relies on users having to duplicate inputs by supplying both a URL for the file input and a filename for the corresponding positional argument, which is obviously error-prone and annoying.

For example, the inputs shown above have to be modified like the following in order to get the desired behavior:

inputs:
  file:
    - name: url1
    - name: url2
  positional:
    - name: pos1
    - name: pos2
    - name: pos3
    - name: filename1
    - name: filename2

Where, filename1 and filename2 are the filenames in the input directory corresponding to the files downloaded from url1 and url2, respectively.

Unfortunately, this means a user must now supply 7 inputs instead of only 5, and the last 2 inputs (of the 7) must match the filenames at the end of the 2 file inputs (urls), which is the annoying and error-prone bit.

Additional context

None