Open kysrpex opened 3 months ago
This issue can be assigned to me. Pinging @bernt-matthias, since he was interested in discussing and testing the integration.
I need to study the case a bit, but as a first impression, this case clearly will need a new special UI entry here:
This UI will have to create the needed entities before the "export" similar to what the RDM file sources are doing. Then, once you have a proper URI that identifies the target entity (something like: elabftw://{elab_url}/entity_type/entity_id/upload_id
) perform the upload in the backend. I don't know if that is possible, I haven't checked the eLabFTW API but that could be a potential solution.
eLabFTW file source for Galaxy
I am developing an integration of Galaxy with eLabFTW and found a couple of design mismatches between eLabFTW and Galaxy that are forcing me to take non-straightforward design decisions. If I am not careful, my decisions may clash with how Galaxy is intended to work, so I thought it makes sense to open an issue to seek consensus and/or other solutions.
Exporting and importing data to Galaxy
To take data out of Galaxy, there is the option to export a history, either as a direct download link or to a file source. Research data management repositories are included in the later group.
To import data to Galaxy, there is the upload option. Data from file sources can be accessed using the "Choose remote files" button.
Remote files are represented and resolved in Galaxy using a path-like URI. File sources tipically define their own URI schema. For example
invenio://zenodo_sandbox/92442/TestProduct.zip
. Directory-like objects may be created in the file source using the endpoint/api/remote_files
, which accepts JSON of the form{"target": "invenio://zenodo_sandbox/92442", "name": "Testing Publishing"}
. File-like objects may be created using/api/histories/{history_id}/write_store
, which accepts JSON that includes thetarget_uri
key:{"target_uri": "invenio://zenodo_sandbox/92442/TestProduct.zip", ...}
.eLabFTW
eLabFTW revolves around the concepts of experiment and resource. Experiments and resources can contain file attachments. The scope of the integration would be exporting data from and importing data to eLabFTW as file attachments.
eLabFTW can be accessed thorugh a REST API, which is documented here. The sections experiments, items (internal name for resources) and uploads are of special relevance. Each entity (be it experiments or items) has an entity id (an integer), and the files attached to an entity, also known as "uploads", have an upload id (also an integer). Entity ids for experiments and items are independent (i.e. an experiment and an item can have the same id). Upload ids are common to experiments and items: an experiment and an item cannot have an attachment with the same id.
eLabFTW's backend assigns new identifiers incrementing the previous identifier of the same type, be it experiment identifiers, item identifiers, or upload identifiers. Experiment, item and upload names are not unique, e.g. two experiments can have the same name.
Integrating Galaxy with eLabFTW
Integrating eLabFTW with Galaxy through a file source involves finding a path-like URI representation for eLabFTW's experiments, items and uploads. A solution that quickly comes to mind are paths of the form
/entity_type/entity_id/upload_id
, where:entity_type
is either 'experiments' or 'resources'entity_id
is the id (an integer) of an experiment or resource (keep in mind those are independent)upload_id
is the id (an integer) of an attachmentAgain, keep in mind that experiment, item and upload names are not unique. A solution based on names would not resolve them unambiguously. From the usability point of view, a solution based on ids may however be a problem, because although names and URIs seem to be decoupled when browsing file sources (see screenshot below),
they are coupled when files are exported (see histories.export.ts, which gets
fileName
from user input).The major issue is though, that
/api/histories/{history_id}/write_store
receives atarget_uri
as input, which means URIs must be known beforehand. But entity ids and upload ids cannot be predicted, because eLabFTW's backend generates them as users create experiments, resources and upload attachments. To make things worse, upload ids are global. This means Galaxy cannot try to guess the next id based on the largest id on the serverAction points
I see thus two areas where taking action is needed:
x
guarantees that it can be retrieved later usingx
, but I do not think that's a good approach).