Open guzman-raphael opened 3 years ago
this is not atypical - it is the primary use case for linking local files into pipelines which was the rationale for filepath to begin with.
@ixcat Ah OK. Should have clarified a bit above :sweat_smile: This is was added based on feedback from Thinh/Dimitri at last week's Devel Meeting.
Yes, apologies for 'barking' a bit - This has shifted a few times in various meetings and it's easy to lose the plot over time.
It's atypical for us since we are building cloud pipelines, but the original design purpose of filepath was to enable local FS linking with the added bonus of providing a 'sync' capability when used with remote stores like S3. We should document/test both use cases.
notes on 'officialness' of filepath in general (tagging onto this issue from datajoint slack questions today):
a) r.e. actual issue point here: ( We should change this in the docs
) - all of filepath is somewhat beta, which is why we have the feature switch. Probably should make a global warning bubble as well / revisit as we go on different sub-portions of filepath (see also #224)
b) misc fragment from chat which might be useful to adapt into the documentation to explain the various use cases more:
Filepath is intended to allow flexible 'linking' with local or remote
stores. The 'location' indicates the intended 'real' location for the
data, and the 'stage' indicates the local filesystem location where the
DataJoint client will work with the corresponding data.
Relative paths within 'real' vs 'stage' are used to determine the actions
taken in the client - this means the file '{stage}/some/file' corresponds
to '{location}/some/file' within the 'real' DataJoint pipeline data.
In the 's3' case, for insert, you would create files in
'{stage}/some/file' using this location in insert, and this triggers copying
into the s3 store at '{location}/some/file' if '/some/file' does not exist within
'location'. For fetch, the client will first check '{stage}/some/file'
before saving from '{location}/some/file' into stage and returning the
stage location to the client.
To contrast with another usecase, this means filepath can also be used
to 'link' objects in existing network shares by setting 'location' and
'stage' to the same network mount - in this case, the client detects the
file is really the same object and does not perform the copy either for
insert or fetch.
Setting
location
andstage
as the same is an atypical case and one that has not been properly validated. We should change this in the docs to prevent users from assuming this is stable.See: https://docs.datajoint.io/python/definition/06.5-External-Data.html#id2
A proper substitute would be: