datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 85 forks source link

Name external store files with primary key when downloaded #1099

Open ghost opened 1 year ago

ghost commented 1 year ago

Feature Request

Problem

In many pipelines, external store files may have identical names. For example, an experimenter may name all their raw electrophysiology data 'data.bin'. When fetching this data, it downloads the external files with just their name into the working directory. This leads to overwriting of all files with identical names.

Note I am referring to the name of the file as it is stored on the local system before inserting and after fetching, not the name of the file in the store itself which is always unique as it is a hash.

Requirements

Provide an option in fetch to include the primary key for that entry in the downloaded file name. Instead of 'data.bin' the downloaded file from the store will be 'PRIMARY-KEY-data.bin'.

Justification

This will allow user to fetch data and download files from the external store that have identical file names.

Alternative Considerations

The alternative would be to force users to name all files uniquely. This is not helpful for some use cases. For example, in an electrophysiology pipeline, raw output may always be named 'data.bin' by the equipment, and the user may then directly upload this to their datajoint pipeline. It would be inconvenient to have to rename these files first.

Screenshots

In this screenshot, I show the result of a fetch on my database. Note that our equipment always names our raw electrophysiology data 'data.bin', but these are different files, which were stored in different directories before being uploaded to our datajoint pipeline. Here, they get overwritten, and my downloads folder only has one 'data.bin' file.

Screenshot from 2023-07-14 13-48-57

Additional Research and Context

Note i have only tested this with an S3 external store not a file store.

horsto commented 1 year ago

I ran into the same inconvenience and I am renaming all files before upload to include the complete primary key. I agree though that this can be inconvenient.