datopian / ckanext-gitdatahub

A CKAN extension to use a git based storage for dataset's metadata that supports versioning.
GNU Affero General Public License v3.0
2 stars 2 forks source link

When the dataset has 2 or more resources with the same name, only one lfs pointer file is written #14

Open pdelboca opened 4 years ago

pdelboca commented 4 years ago

CKAN allows to have several resources with the same name, so the logic to write lfs files must address this use case.

Acceptance Criteria

Tasks

Analysis

CKAN allows a dataset to have multiple resources with the same name, however we cannot have to files with the same name in one folder in git. How are we going to address this?

So far the GitHub API wrapper allows us to define this parameters:

repo.create_file(
        "data/{}".format(obj['name']),
        "Create LfsPointerFile",
        lfs_pointer_body,
    )

The Resource has a unique id field: 'id': u'2a0905f0-70fe-4843-9d8f-a298c9a61735' that it is not user friendly but we can use to name the file in the repository.

rufuspollock commented 4 years ago

@pdelboca i don't like using the uid that much. Can you summarize what info a resource normally has. Maybe we can move to uid only when there is a name conflict ...

pdelboca commented 4 years ago
'resources': [{'cache_last_updated': None,
                'cache_url': None,
                'created': '2020-04-28T19:35:25.665649',
                u'datastore_active': False,
                'description': u'Resource description',
                'format': u'CSV',
                'hash': u'',
                'id': u'480e2c26-7f40-4b12-ad38-8b665426e810',
                'last_modified': '2020-04-28T19:35:25.623739',
                'mimetype': u'text/csv',
                'mimetype_inner': None,
                'name': u'mini-csv.csv',
                'package_id': u'4afb3a7a-4973-43aa-a38e-ffa7610fc2dd',
                'position': 0,
                'resource_type': None,
                'revision_id': u'2f6080ea-138a-4877-b0f9-0bb0f0af86c2',
                'size': 40L,
                'state': u'active',
                'url': u'http://ckan:5000/dataset/4afb3a7a-4973-43aa-a38e-ffa7610fc2dd/resource/480e2c26-7f40-4b12-ad38-8b665426e810/download/mini-csv.csv',
                'url_type': u'upload'}],

@rufuspollock This is the info a resource normally have. There is a position element, we can probably append it to the name to make it unique.

rufuspollock commented 4 years ago

@pdelboca