galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

eLabFTW file source for Galaxy #18665

Open kysrpex opened 1 month ago

kysrpex commented 1 month ago

eLabFTW file source for Galaxy

I am developing an integration of Galaxy with eLabFTW and found a couple of design mismatches between eLabFTW and Galaxy that are forcing me to take non-straightforward design decisions. If I am not careful, my decisions may clash with how Galaxy is intended to work, so I thought it makes sense to open an issue to seek consensus and/or other solutions.

Exporting and importing data to Galaxy

To take data out of Galaxy, there is the option to export a history, either as a direct download link or to a file source. Research data management repositories are included in the later group.

Exporting Galaxy histories

To import data to Galaxy, there is the upload option. Data from file sources can be accessed using the "Choose remote files" button.

Importing data to Galaxy

Remote files are represented and resolved in Galaxy using a path-like URI. File sources tipically define their own URI schema. For example invenio://zenodo_sandbox/92442/TestProduct.zip. Directory-like objects may be created in the file source using the endpoint /api/remote_files, which accepts JSON of the form {"target": "invenio://zenodo_sandbox/92442", "name": "Testing Publishing"}. File-like objects may be created using /api/histories/{history_id}/write_store, which accepts JSON that includes the target_uri key: {"target_uri": "invenio://zenodo_sandbox/92442/TestProduct.zip", ...}.

eLabFTW

eLabFTW revolves around the concepts of experiment and resource. Experiments and resources can contain file attachments. The scope of the integration would be exporting data from and importing data to eLabFTW as file attachments.

eLabFTW can be accessed thorugh a REST API, which is documented here. The sections experiments, items (internal name for resources) and uploads are of special relevance. Each entity (be it experiments or items) has an entity id (an integer), and the files attached to an entity, also known as "uploads", have an upload id (also an integer). Entity ids for experiments and items are independent (i.e. an experiment and an item can have the same id). Upload ids are common to experiments and items: an experiment and an item cannot have an attachment with the same id.

eLabFTW's backend assigns new identifiers incrementing the previous identifier of the same type, be it experiment identifiers, item identifiers, or upload identifiers. Experiment, item and upload names are not unique, e.g. two experiments can have the same name.

Integrating Galaxy with eLabFTW

Integrating eLabFTW with Galaxy through a file source involves finding a path-like URI representation for eLabFTW's experiments, items and uploads. A solution that quickly comes to mind are paths of the form /entity_type/entity_id/upload_id, where:

Again, keep in mind that experiment, item and upload names are not unique. A solution based on names would not resolve them unambiguously. From the usability point of view, a solution based on ids may however be a problem, because although names and URIs seem to be decoupled when browsing file sources (see screenshot below),

Galaxy client requests made while browsing a file source

they are coupled when files are exported (see histories.export.ts, which gets fileName from user input).

The major issue is though, that /api/histories/{history_id}/write_store receives a target_uri as input, which means URIs must be known beforehand. But entity ids and upload ids cannot be predicted, because eLabFTW's backend generates them as users create experiments, resources and upload attachments. To make things worse, upload ids are global. This means Galaxy cannot try to guess the next id based on the largest id on the server

  1. because API requests would be scoped to a single user, which can only see entities it has been granted permission to see,
  2. because it does not scale; when two simultaneous uploads occur, their ids cannot be predicted.

Action points

I see thus two areas where taking action is needed:

  1. Fully decoupling path-like URIs from the names displayed to the user and the user's input.
  2. Letting Galaxy create new files on a file source without needing to know the last part of their URI beforehand (or alternatively, breaking some properties of paths, for example that saving a file on path x guarantees that it can be retrieved later using x, but I do not think that's a good approach).
kysrpex commented 1 month ago

This issue can be assigned to me. Pinging @bernt-matthias, since he was interested in discussing and testing the integration.

davelopez commented 1 month ago

I need to study the case a bit, but as a first impression, this case clearly will need a new special UI entry here: image

This UI will have to create the needed entities before the "export" similar to what the RDM file sources are doing. Then, once you have a proper URI that identifies the target entity (something like: elabftw://{elab_url}/entity_type/entity_id/upload_id) perform the upload in the backend. I don't know if that is possible, I haven't checked the eLabFTW API but that could be a potential solution.