In many pipelines, external store files may have identical names. For example, an experimenter may name all their raw electrophysiology data 'data.bin'. When fetching this data, it downloads the external files with just their name into the working directory. This leads to overwriting of all files with identical names.
Note I am referring to the name of the file as it is stored on the local system before inserting and after fetching, not the name of the file in the store itself which is always unique as it is a hash.
Requirements
Provide an option in fetch to include the primary key for that entry in the downloaded file name. Instead of 'data.bin' the downloaded file from the store will be 'PRIMARY-KEY-data.bin'.
Justification
This will allow user to fetch data and download files from the external store that have identical file names.
Alternative Considerations
The alternative would be to force users to name all files uniquely. This is not helpful for some use cases. For example, in an electrophysiology pipeline, raw output may always be named 'data.bin' by the equipment, and the user may then directly upload this to their datajoint pipeline. It would be inconvenient to have to rename these files first.
Screenshots
In this screenshot, I show the result of a fetch on my database. Note that our equipment always names our raw electrophysiology data 'data.bin', but these are different files, which were stored in different directories before being uploaded to our datajoint pipeline. Here, they get overwritten, and my downloads folder only has one 'data.bin' file.
Additional Research and Context
Note i have only tested this with an S3 external store not a file store.
I ran into the same inconvenience and I am renaming all files before upload to include the complete primary key. I agree though that this can be inconvenient.
Feature Request
Problem
In many pipelines, external store files may have identical names. For example, an experimenter may name all their raw electrophysiology data 'data.bin'. When fetching this data, it downloads the external files with just their name into the working directory. This leads to overwriting of all files with identical names.
Note I am referring to the name of the file as it is stored on the local system before inserting and after fetching, not the name of the file in the store itself which is always unique as it is a hash.
Requirements
Provide an option in fetch to include the primary key for that entry in the downloaded file name. Instead of 'data.bin' the downloaded file from the store will be 'PRIMARY-KEY-data.bin'.
Justification
This will allow user to fetch data and download files from the external store that have identical file names.
Alternative Considerations
The alternative would be to force users to name all files uniquely. This is not helpful for some use cases. For example, in an electrophysiology pipeline, raw output may always be named 'data.bin' by the equipment, and the user may then directly upload this to their datajoint pipeline. It would be inconvenient to have to rename these files first.
Screenshots
In this screenshot, I show the result of a fetch on my database. Note that our equipment always names our raw electrophysiology data 'data.bin', but these are different files, which were stored in different directories before being uploaded to our datajoint pipeline. Here, they get overwritten, and my downloads folder only has one 'data.bin' file.
Additional Research and Context
Note i have only tested this with an S3 external store not a file store.