Support multiple inputs

clarin-eric / switchboard

The Switchboard: a web application serving as a broker between data sets and data processing/analysis tools.

Other

6 stars 2 forks source link

Support multiple inputs #28

Open claus-zinn opened 7 years ago

claus-zinn commented 7 years ago

There are tools that require more than a single resource, e.g., tools that align audio with text, see curl -v -X POST -H 'content-type: multipart/form-data' -F LANGUAGE=deu-DE -F TEXT=@<filename> -F SIGNAL=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic'

In the LRS standalone version, the UI must allow users to upload two (or more) files. The matcher algorithm must be extended to cope with the additional complexity.

The issue is more complex from the VLO perspective. For the time being, only a single file can be transferred to the LRS (but this could be an archive containing the files). Any solution must take the batch processing issue into account.

emanueldima commented 4 years ago

I see the following potential problems:

We assume the repositories should be able to send multiple inputs to the Switchboard. Should we allow mimetype/language be specified for each of the inputs? This implies changing the Switchboard API.
Should the Switchboard allow the user somehow to get one input from one repository and another input from another repository?
Should the user be allowed to add their own input to another one that comes from a repository, making a multiple input from a single input? What's the new UI complexity?
The service API must also be changed, tools must be allowed to describe multiple inputs.
If we'll ever get batch processing, how would it work with multiple inputs? Is a tool that supports batched processing the same thing as a tool that takes an unlimited number of inputs of the same type?

proycon commented 4 years ago

The possibility for multiple inputs is also something that CLAM supports. So currently not all CLAM services would be expressible for the switchboard, only the simpler ones.

The CLAM webservice specification is accordingly expressive to accommodate this (multiple input templates) and may provide some inspiration if you guys want to go this way, though larger initiatives such as openapis.org may also be worth checking out and can offer the same.

If we'll ever get batch processing, how would it work with multiple inputs? Is a tool that supports batched processing the same thing as a tool that takes an unlimited number of inputs of the same type?

I'd say, if the 'unlimited' number of inputs is specified in a single invocation, then that would indeed be a tool that supports batch processing. CLAM does that.

proycon commented 4 years ago

When it comes to multiple inputs, a distinction should also be made between multiple inputs at the same time (which is mostly what this issue is about), and multiple independent routes through a webservice (which is also addressed in #65).

An example of the former is a tool that takes multiple files as input and handles them all in a single run.
An example of the latter is a tool that can take say, a plain text document or a PDF document and do something with it. If this can not be expressed well it leads to the unnecessary duplication of registry entries as addressed in #65.

emanueldima commented 4 years ago

To be considered in #4

twagoo commented 3 years ago

Voyant seems to support uploading multiple files through repeated input parameters to create a multi-file corpus.

Example:

https://voyant-tools.org/?input=https%3A%2F%2Fwww.gnu.org%2Flicenses%2Fold-licenses%2Fgpl-2.0.txt&input=https%3A%2F%2Fwww.gnu.org%2Flicenses%2Fgpl-3.0.txt

Note: we also send the media type as a parameter, which may or may not be possible for multiple inputs; in any case, it doesn't appear to be mandatory (see example above)