Het format wordt gevalideerd wanneer het bestand geüpload wordt.
Via een API kan een document (data-object) gecreëerd, geraadpleegd, gemuteerd en verwijderd worden.
Bij een document moet exact één bestand (bijvoorbeeld een PDF) worden opgeslagen.
De creatie, mutatie of verwijdering van een document wordt gelogd (zie #16)
De API specificatie is bijgewerkt (ReDoc, Swagger)
De documentatie is bijgewerkt (Read the Docs)
Document creation + uploads
We make use of the file parts mechanism of the Documenten API, always.
ODRC will proxy to the underlying documenten API
Preparing a document upload
The client must pass the necessary metadata and then based on the response of that,
cut up the file upload in parts that can be submitted individually.
The request body schema of a Document POST would look something like:
Document:
type: object
required:
- identifier
properties:
publicatie: # UUID of the publication it belongs to
type: string
format: uuid
identifier: # the 'primary' identifier
type: string
creatiedatum:
type: string
format: date
officieleTitel: # DiWoo doesn't seem to apply a max length
type: string
verkorteTitel:
type: string
omschrijving:
type: string
# from waardelijst -> expose options in separate endpoint (!)
# POST (write) operations should be able to just provide the identifier IRI instead of this complex object if ICATT desires this
bestandsformaat:
type: object
properties:
identifier: # IRI from waardelijst
type: string
format: uri
mimeType: # e.g. application/pdf
type: string
naam:
type: string # e.g. "PDF"
This translates to a request of ODRC -> Documenten API with schema:
EnkelvoudigInformatieobject:
type: object
properties:
identificatie: # primary identifier of Document OR let the Documenten API generate one?
type: string
bronorganisatie: # fixed, global configuration parameter in ODRC initially, could become 'smart' in the future
type: string
creatiedatum: # taken from Document.creatiedatum
type: string
format: date
titel:
type: string
minLength: 1
maxLength: 200
auteur: # taken from Document.publicatie.organization
type: string
status:
const: definitief # archiving will move this to gearchiveerd, later
formaat: # derived from Document.bestandsformaat
type: string
taal: # derived from Document.taal -> convert to/from ISO 639-2/B
type: string
enum:
- dut
- eng
bestandsnaam: # taken from Document.bestandsnaam
type: string
bestandsomvang: # taken from Document.bestandsomvang, prepares the file parts
type: number
indicatieGebruiksrecht:
const: false
informatieobjecttype: # points to /catalogi/api/v1/informatieobjecttypen/:uuid for the Document.publicatie.informatiecategorie
type: string
format: uri
The Documenten API will return a lock and list of BestandsDelen for upload, each
bestandsdeel will have the shape:
BestandsDeel:
type: object
properties:
url: # URL to PUT to
type: string
format: uri
volgnummer:
type: integer
omvang:
type: integer
voltooid:
const: false
lock: # ??
type: string
The ODRC will then expose endpoints for these part uploads so the publication component
can upload the parts:
URL: /api/v1/documenten/:uuid/bestandsdeel/:index
A bestandsdeel wil simply be multipart/form-data, with the API key as auth header.
The request will be transformed by the ODRC, which adds the lock ID & JWT for the
Documenten API, and streams the file part down to the Documenten API.
Once all parts are received, we unlock the created document.
Tasks
[x] Implement API endpoint/serializer for POST /api/v1/documenten
Part of the metadata is stored in our database
Other metadata that we can store in Documenten API, we store there
Record the requested bestandsdelen from the Documenten API
Record the lock ID to finalize the upload/creation
Resolve the internal "Catalogi API" endpoint for the informatieobjecttype
[ ] Implement the API endpoint/serializer for PUT /api/v1/documenten/:uuid/bestandsdelen/:index
Use the stored lock ID & other configuration/metadata to all the Documenten API /api/v1/bestandsdelen/:uuid
If all parts are completely uploaded, unlock the document
Emit in the response body documentFinalized: true|false so that the client can be informed that they can refresh the document resource if needed, as file uploads will likely happen in parallel
Nice to have: check if we can make use of proxying/streaming (timebox: 1 day)
[ ] The first part of the document contains the magic bytes that can be used to validate against the format in the metadata. Check the upload validation in Open Forms for inspiration (and edge cases).
Acceptance criteria
Copied/moved from #2
Document creation + uploads
We make use of the file parts mechanism of the Documenten API, always.
Preparing a document upload
The client must pass the necessary metadata and then based on the response of that, cut up the file upload in parts that can be submitted individually.
The request body schema of a Document POST would look something like:
This translates to a request of ODRC -> Documenten API with schema:
The Documenten API will return a
lock
and list of BestandsDelen for upload, each bestandsdeel will have the shape:The ODRC will then expose endpoints for these part uploads so the publication component can upload the parts:
URL:
/api/v1/documenten/:uuid/bestandsdeel/:index
A bestandsdeel wil simply be multipart/form-data, with the API key as auth header.
The request will be transformed by the ODRC, which adds the lock ID & JWT for the Documenten API, and streams the file part down to the Documenten API.
Once all parts are received, we unlock the created document.
Tasks
POST /api/v1/documenten
PUT /api/v1/documenten/:uuid/bestandsdelen/:index
/api/v1/bestandsdelen/:uuid
documentFinalized: true|false
so that the client can be informed that they can refresh the document resource if needed, as file uploads will likely happen in parallelformat
in the metadata. Check the upload validation in Open Forms for inspiration (and edge cases).