SciCatProject / scicat-backend-next

SciCat Data Catalogue Backend
https://scicatproject.github.io/documentation/
BSD 3-Clause "New" or "Revised" License
19 stars 24 forks source link

Attachments for multiple items #674

Open jl-wynen opened 1 year ago

jl-wynen commented 1 year ago

Attachments for multiple items

Current Behaviour

Currently, attachments can be attached to any combination of a dataset, sample, or proposal. E.g., by using

attachment = model.UploadAttachment(
    caption="Pretty picture no 1",
    ownerGroup="group1",
    datasetId=dataset_id,
    sampleId=sample_id,
)

and uploading it to datasets/{quote_plus(dataset_id)}/attachments has the effect of attaching it to both the dataset and sample. So it can be downloaded via both endpoints.

Expected Behaviour / Discussion

Is this a good behaviour? Or should attachments be allowed to be attached to only one item?

I don't have a good argument for allowing multiple items. It seems that attachments are typically quite specific to one dataset / sample / proposal. But can anyone come up with a use case?

Potential solution

If we restrict to a single item, the datasetID, sampleId, and proposalId fields should not be allowed in uploads. Instead, the backend should fill them in based on which endpoint was used for the upload. This would also remove the duplication. (Currently, the id must be specified in both the model and the path for the request.)

jl-wynen commented 1 year ago

My suggested solution leads to a divide between datablocks and attachments. When creating a datablock, you need to specify the dataset id in the datablock model and not the path. But for attachments it's the other way around. I'm not sure this is an issue as the endpoints are already different. Attachments are created via datasets/{pid}/attachments but data blocks via origdatablocks.

bpedersen2 commented 1 year ago

My suggested solution leads to a divide between datablocks and attachments. When creating a datablock, you need to specify the dataset id in the datablock model and not the path. But for attachments it's the other way around. I'm not sure this is an issue as the endpoints are already different. Attachments are created via datasets/{pid}/attachments but data blocks via origdatablocks.

One of the possible reason for this behaviour: an attachment needs the parent unconditionally, while I can imagine use cases where the datablocks are actually created before the dataset ( and then linked once the dataset gets created). This usage is currently not supported (as the parent is required in schema and dto), and I would not oppose to adjust the API to use a more restful approach here.

nitrosx commented 1 year ago

Apologies for the delayed reply. Attachments can only be created through the parent endpoint just because that's how they were implemented in the BE v3 and acceptance for the migration was to pass BE v3 test. I'm not opposed to, and actually in favor for creating an attachment endpoint, like the one that we have for origdatablocks. A dedicated endpoint make code management easier and allows to find orphaned attachments, which we are not able to do, at the moment.