Closed shevron closed 4 years ago
In this ticket we discussed how to implement the scenario in which two resources has the same name (possible in CKAN) given that git doesn't allow to filenames with the same name in a folder.
Basically:
data.csv
write a lfspointer called data.csv
data.csv
write one lfspointe called data.csv
for the first one and <uid>.csv
for the second/third/..We are not going to handle this in metastore-lib, but just expect path
values for resources to be set correctly in a non-conflicting way by the calling code.
In the GitHub storage backend, we will do some last minute sanity check on resource paths: before committing them:
path
is a URL, we do not store a Git LFS pointer and don't need to worry about itpath
is a POSIX-path and paths conflict, throw ValueError
exceptionpath
is a POSIX-path and path contains /../
or begins with /
, throw ValueError
exceptionWhen integrating with CKAN, path normalization and de-duplication will be handled in ckanext-versioning or a conversion library used by it. See https://github.com/datopian/ckanext-versioning/issues/2
When the
lfs_server_url
config option is set, we should start creating and committing LFS pointer files and LFS config based on resources we get:lfs_server_url
is not set, do not create LFS config / pointer filessha256
andbytes
set, create a suitable LFS pointer file.gitattributes
and.lfsconfig
files to match/
or/../
), otherwise raiseValueError