avalonmediasystem / avalon

Avalon Media System – Samvera Application
http://www.avalonmediasystem.org/
Apache License 2.0
93 stars 51 forks source link

Support Caption and Transcript Uploads Via API #5801

Closed joncameron closed 1 day ago

joncameron commented 2 months ago

Description

Following from #5710; caption and transcripts should be manipulable via the Avalon API so that these documents can be uploaded and attached to records programatically.

As a user, I want to programmatically add captions, transcripts and supplemental files to master files on a media object.

API

Requirements

For a masterfile:

Routes

Questions

Request: Add new supplemental file to masterfile

POST /master_files/#{fedora_id}/supplemental_files
Content-Type: application/octet-stream
Content-Disposition: file; filename="filename.jpg"

[… binary file data …]

Request: replace existing supplemental file with a new binary file

PUT /master_files/#{fedora_id}//supplemental_files/#{id}
Content-Type: application/octet-stream
Content-Disposition: file; filename="filename.jpg"

[… binary file data …]

Request: Get data on supplemental file

GET /master_files/#{fedora_id}//supplemental_files/#{id}.json

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    }

Request: Update data on supplemental file

PUT /master_files/#{fedora_id}//supplemental_files/#{id}.json

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    }

Request: Get listing of supplemental files on masterfile

GET /master_files/ns064602j/supplemental_files.json Returns an array of supplemental files

[
    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    },
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "1"
    }
]

Request: Delete supplemental file

DELETE /master_files/#{fedora_id}//supplemental_files/#{id} Deletes a supplemental file from the masterfile

HTTP Response; no JSON returned

Done Looks Like

QA

Current Caption Upload Example

-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="authenticity_token"

lPz5ffZw/vXCFXNyCZdepS9+UZnf8BcFhAG0bgi8sBHGuZadKvDIfZsA/QHP/7eK46qzEFkd3Rh2rJkVI9ymaw==
-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="supplemental_file[tags][]"

caption
-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="supplemental_file[file]"; filename="lunchroom manners.srt"
Content-Type: text/x-srt

1
00:00:01,200 --> 00:00:21,000
[music]

2
00:00:22,200 --> 00:00:26,600
Just before lunch one day, a puppet show 
was put on at school.

3
00:00:26,700 --> 00:00:31,500
It was called "Mister Bungle Goes to Lunch".

... (rest of the file here)

-----------------------------121240327742272709152918634858--

Current Form Data send on POST

{
    "_method": "put",
    "supplemental_file[label]": "Hipparchus (146 to 127 B.C.).vtt",
    "supplemental_file[language]": "French",
    "treat_as_transcript_131": "1",
    "machine_generated_131": "1",
    "cancel_edit_label": "",
    "save_label": ""
}
joncameron commented 1 month ago

Need to adjust things on the language support for the API; Language value is set on the model based on default value. We'll need to update that behavior for this purpose, so that the value can be set up front when the supplemental file is created.

Should look at the JSON API standard and other schemas for API architecture.

GET to /supplemental_file/#{id} returns the binary; we don't have a place to get the metadata from one of these routes. GET to /supplemental_file/#{id}.json (with appropriate headers) should get the JSON metadata about the file... and maybe offer a URL to the binary (/supplemental_file/#{id} or /supplemental_file/#{id}/caption etc.). If mirroring for create/update, PUT or POST to .json should be the metadata and PUT or POST to /#{id} should be the binary. For creation, we'd have to figure out what order we want to do this. Submit both things at the same time? What does our model expect or require?

joncameron commented 1 month ago

The model has a couple required metadata fields but with Active Storage attachment, you can create the supplemental file object but not have a file attached immediately. We probably wouldn't be able to create the file without any associated metadata. At the very least it would need to be metadata first, file second. Ideally, though, find a way to bundle it all together.

joncameron commented 1 month ago

Two relevant options here: https://cloud.google.com/storage/docs/uploading-objects#rest-upload-objects https://www.drupal.org/node/2941420

joncameron commented 1 month ago

Ex: POST to /media_objects get the masterfile ID POST to /master_file/id/supplemental_files

When you POST to supplemental_files, it will save or update the media object, OR don't worry about it because it's handled by the handling on the masterfile.

Supplemental file create/update should ensure that saves and index updates are done accordingly as needed.

masaball commented 1 month ago

Note for testing/documentation: When uploading binary content you should provide the mime type in the curl request like in the examples. This attribute should be optional, but I encountered some flaky/weird behavior during development where the SupplementalFile content type was not always being saved properly unless the type=mimetype attribute was included with the file in the curl request. Required mime types for captions: text/vtt or text/srt.

Request: Add new supplemental file to masterfile

Create with file and metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath;type=mimetype" -F "metadata=<metadata_filepath" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with file and inline metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath;type=mimetype" -F metadata='{"label": "Lunchroom", "language": "French"}' https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just file:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath;type=mimetype" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -d @metadata_filepath -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just metadata inline:

curl -H "Avalon-Api-Key:abcdef123456" -X POST '{"label": "Lunchroom", "language": "French"}' -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Response: { "id": :supplemental_file_id }

Request: replace existing supplemental file with a new binary file

Update attached file:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -F "file=@content_filepath;type=mimetype" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id

Response: {"id": :supplemental_file_id}

Request: Get data on supplemental file

curl -H "Avalon-Api-Key:abcdef123456" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Response:

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "true",
        "machine_generated": "true"
    }

Request: Update data on supplemental file

Updating metadata requires ALL existing fields except language to be included in the payload. Any existing non-language metadata field that is left out of the payload will be removed.

Update metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -d @metadata_filepath -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Payload file:

    {
        "type": "transcript",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": false,
        "machine_generated": true
    }

Update metadata inline:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -d '{"label": "label", "language": "French"}' -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Response: {"id": :supplemental_file_id}

Request: Get listing of supplemental files on masterfile

curl -H "Avalon-Api-Key:abcdef123456" "https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json"

Returns an array of supplemental files

[
    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "true",
        "machine_generated": "true"
    },
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "true"
    }
]

Paginated:

curl -H "Avalon-Api-Key:abcdef123456" "https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json?per_page=1&amp;page=2"

Returns array containing per_page results:

[
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "true"
    }
]

Request: Delete supplemental file

curl -H "Avalon-Api-Key:abcdef123456" -X DELETE https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Deletes a supplemental file from the masterfile

HTTP Response; no JSON returned
elynema commented 2 weeks ago

@joncameron Can you QA this one since you are most likely to be using this API?

joncameron commented 1 day ago

5918 is for an unexpected response I got during testing the PUT, but otherwise all of the routes worked as expected.