datajoint / datajoint-docs-original

https://docs.datajoint.org
Other
2 stars 13 forks source link

Filepath: Modify stage to not match location on a 'file' store (atypical case) #249

Open guzman-raphael opened 3 years ago

guzman-raphael commented 3 years ago

Setting location and stage as the same is an atypical case and one that has not been properly validated. We should change this in the docs to prevent users from assuming this is stable.

See: https://docs.datajoint.io/python/definition/06.5-External-Data.html#id2

A proper substitute would be:

dj.config['stores'] = {
  'data': {
    'protocol': 'file',
    'location': '/data',
    'stage': '/stage'
  }
}
ixcat commented 3 years ago

this is not atypical - it is the primary use case for linking local files into pipelines which was the rationale for filepath to begin with.

guzman-raphael commented 3 years ago

@ixcat Ah OK. Should have clarified a bit above :sweat_smile: This is was added based on feedback from Thinh/Dimitri at last week's Devel Meeting.

ixcat commented 3 years ago

Yes, apologies for 'barking' a bit - This has shifted a few times in various meetings and it's easy to lose the plot over time.

It's atypical for us since we are building cloud pipelines, but the original design purpose of filepath was to enable local FS linking with the added bonus of providing a 'sync' capability when used with remote stores like S3. We should document/test both use cases.

ixcat commented 3 years ago

notes on 'officialness' of filepath in general (tagging onto this issue from datajoint slack questions today):

a) r.e. actual issue point here: ( We should change this in the docs) - all of filepath is somewhat beta, which is why we have the feature switch. Probably should make a global warning bubble as well / revisit as we go on different sub-portions of filepath (see also #224)

b) misc fragment from chat which might be useful to adapt into the documentation to explain the various use cases more:

Filepath is intended to allow flexible 'linking' with local or remote 
stores. The 'location' indicates the intended 'real' location for the 
data, and the 'stage' indicates the local filesystem location where the
DataJoint client will work with the corresponding data.

Relative paths within 'real' vs 'stage' are used to determine the actions 
taken in the client - this means the file '{stage}/some/file' corresponds
to '{location}/some/file' within the 'real' DataJoint pipeline data.

In the 's3' case, for insert, you would create files in
'{stage}/some/file' using this location in insert, and this triggers copying
into the s3 store at '{location}/some/file' if '/some/file' does not exist within
'location'. For fetch, the client will first check '{stage}/some/file'
before saving from '{location}/some/file' into stage and returning the
stage location to the client.

To contrast with another usecase, this means filepath can also be used
to 'link' objects in existing network shares by setting 'location' and
'stage' to the same network mount - in this case, the client detects the
file is really the same object and does not perform the copy either for
insert or fetch.