GeneriekPublicatiePlatformWoo / registratie-component

A registration providing the functionalities for a "public documents" storage.
https://odrc.readthedocs.io
Other
1 stars 1 forks source link

ODRC: implement Document creation/upload #19

Open sergei-maertens opened 2 months ago

sergei-maertens commented 2 months ago

Acceptance criteria

Copied/moved from #2

Document creation + uploads

We make use of the file parts mechanism of the Documenten API, always.

Preparing a document upload

The client must pass the necessary metadata and then based on the response of that, cut up the file upload in parts that can be submitted individually.

The request body schema of a Document POST would look something like:

Document:
  type: object
  required:
    - identifier
  properties:
    publicatie:  # UUID of the publication it belongs to
      type: string
      format: uuid
    identifier:  # the 'primary' identifier
      type: string
    creatiedatum:
      type: string
      format: date
    officieleTitel:  # DiWoo doesn't seem to apply a max length
      type: string
    verkorteTitel:
      type: string
    omschrijving:
      type: string
    # from waardelijst -> expose options in separate endpoint (!)
    # POST (write) operations should be able to just provide the identifier IRI instead of this complex object if ICATT desires this
    bestandsformaat:  
      type: object
      properties:
        identifier:  # IRI from waardelijst
          type: string
          format: uri
        mimeType:  # e.g. application/pdf
          type: string
        naam:
          type: string  # e.g. "PDF"

This translates to a request of ODRC -> Documenten API with schema:

EnkelvoudigInformatieobject:
  type: object
  properties:
    identificatie:  # primary identifier of Document OR let the Documenten API generate one?
      type: string
    bronorganisatie:  # fixed, global configuration parameter in ODRC initially, could become 'smart' in the future
      type: string
    creatiedatum:  # taken from Document.creatiedatum
      type: string
      format: date
    titel:
      type: string
      minLength: 1
      maxLength: 200
    auteur:  # taken from Document.publicatie.organization
      type: string
    status:
      const: definitief  # archiving will move this to gearchiveerd, later
    formaat:  # derived from Document.bestandsformaat
      type: string
    taal:  # derived from Document.taal -> convert to/from ISO 639-2/B
      type: string
      enum:
        - dut
        - eng
    bestandsnaam:  # taken from Document.bestandsnaam
      type: string
    bestandsomvang:  # taken from Document.bestandsomvang, prepares the file parts
      type: number
    indicatieGebruiksrecht:
      const: false
    informatieobjecttype:  # points to /catalogi/api/v1/informatieobjecttypen/:uuid for the Document.publicatie.informatiecategorie
      type: string
      format: uri

The Documenten API will return a lock and list of BestandsDelen for upload, each bestandsdeel will have the shape:

BestandsDeel:
  type: object
  properties:
    url:  # URL to PUT to
      type: string
      format: uri
    volgnummer:
      type: integer
    omvang:
      type: integer
    voltooid:
      const: false
    lock: # ??
      type: string

The ODRC will then expose endpoints for these part uploads so the publication component can upload the parts:

URL: /api/v1/documenten/:uuid/bestandsdeel/:index

A bestandsdeel wil simply be multipart/form-data, with the API key as auth header.

The request will be transformed by the ODRC, which adds the lock ID & JWT for the Documenten API, and streams the file part down to the Documenten API.

Once all parts are received, we unlock the created document.

Tasks

sergei-maertens commented 5 days ago

not done