PIDs for individual files in GigaDB

User story

As a user
I want to be able to point to a unique file within a dataset using a PID So that I can be sure I'm referencing the exact file of interest

Acceptance criteria

Given a file is present in a GigaDB dataset When I refer to that file from an external resource Then I can be sure I am using a globally unique persistent identifier of that file

Additional Info

The main reference point will always be the dataset as an entity, but we should enable a facility for any to specify down to the individual file exactly which file they are referring to. We could mint DOI's through datacite for all files, but I think that is excessive due to the need for a high degree of mandatory metadata, and would therefore complicate matters unessacerily. Instead I would prefer to use something more light-weight such as HANDLES or ARKS, my preference would be ARK, this link explains why.

Moving forwards, adding PIDs to individual files would allow us to then add all the files as metadata objects to the DataCite metadata as "relatedIdentifiers" thus further exposing metadata to a wider indexing and hopefully drive discovery.

Product Backlog Item Ready Checklist

[ ] Business value is clearly articulated
[ ] Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
[ ] Dependencies are identified and no external dependencies would block this item from being completed
[ ] At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
[ ] This item is estimated and small enough to comfortably be completed in one sprint
[ ] Acceptance criteria are clear and testable
[ ] Performance criteria, if any, are defined and testable
[ ] The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

[ ] Item(s) in increment pass all Acceptance Criteria
[ ] Code is refactored to best practices and coding standards
[ ] Documentation is updated as needed
[ ] Data security has not been compromised (with particular reference to the personal information we hold in GigaDB)
[ ] No deviation from the team technology stack and software architecture has been introduced
[ ] The product is in a releasable state (i.e. the increment has not broken anything)

gigascience / gigadb-website