kinesis-video-archived-media client discards milliseconds from timestamps in list_fragments and get_clip

mdickinson commented 1 month ago

Describe the bug

I'm using the kinesis-video-archived-media client to fetch fragment information and MP4 clips from a KVS stream. Both the list_fragments method and the get_clip method of the client expect a TimestampRange object with a StartTimestamp and EndTimestamp. I'm supplying (UTC-timezoned) datetime.datetime instances for each of those, and those instances have millisecond resolution.

In both cases, the eventual request made to AWS appears to drop the millisecond information from the timestamps, resulting in me getting a different fragment list (or video clip range) from the one I expect.

Expected Behavior

I hoped for the millisecond information in the start timestamp and end timestamp to be preserved: the fragments themselves have time information that includes millisecond resolution, so it's useful to be able to use millisecond resolution to select the desired fragments.

Current Behavior

The requests succeeded, but returned fragments matched what would happen if the millisecond portion of the timestamps was being discarded. (By doing boto3.set_stream_logger(name="botocore") I was able to verify that this was in fact what was happening.)

Reproduction Steps

Here's Python code that reproduces the issue for me. It needs a valid KVS STREAM_NAME to run.

import datetime

import boto3

boto3.set_stream_logger(name="botocore")

STREAM_NAME = "some-stream"

session = boto3.Session()
client = session.client("kinesisvideo")
list_fragments_endpoint = client.get_data_endpoint(
    StreamName=STREAM_NAME,
    APIName="LIST_FRAGMENTS",
)["DataEndpoint"]

archived_client = session.client(
    "kinesis-video-archived-media",
    endpoint_url=list_fragments_endpoint,
)
archived_client.list_fragments(
    StreamName=STREAM_NAME,
    FragmentSelector={
        "FragmentSelectorType": "PRODUCER_TIMESTAMP",
        "TimestampRange": {
            "StartTimestamp": datetime.datetime(
                2024, 9, 12, 10, 49, 36, 500000, tzinfo=datetime.timezone.utc
            ),
            "EndTimestamp": datetime.datetime(
                2024, 9, 12, 10, 49, 38, 833000, tzinfo=datetime.timezone.utc
            ),
        },
    },
)

When I run this code, the logs include the following output:

2024-09-12 16:33:00,518 botocore.endpoint [DEBUG] Making request for OperationModel(name=ListFragments) with params: {'url_path': '/listFragments', 'query_string': {}, 'method': 'POST', 'headers': {'Content-Type': 'application/json', 'User-Agent': 'Boto3/1.34.33 md/Botocore#1.34.33 ua/2.0 os/macos#23.6.0 md/arch#x86_64 lang/python#3.8.18 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.33'}, 'body': b'{"StreamName": "some-stream", "FragmentSelector": {"FragmentSelectorType": "PRODUCER_TIMESTAMP", "TimestampRange": {"StartTimestamp": 1726138176, "EndTimestamp": 1726138178}}}', 'url': 'https://b-87178fb5.kinesisvideo.ap-northeast-1.amazonaws.com/listFragments', 'context': {'client_region': 'ap-northeast-1', 'client_config': <botocore.config.Config object at 0x10a699a60>, 'has_streaming_input': False, 'auth_type': None}}

Here the StartTimestamp has value 1726138176 where I was expecting 1726138176.5, and similarly for the EndTimestamp.

I've verified that if I hack the serialization logic here then AWS is happy to accept timestamps with millisecond resolution, and to return the appropriate fragment list as a result.

Possible Solution

It looks as though the serialization logic for the timestamp range start and end ends up using _timestamp_unixtimestamp, which is where the millisecond information is discarded.

I don't have enough familiarity with botocore internals (yet) to suggest a fix, but it looks as though the specification here should be using something other than plain old Timestamp for the shape. I see TimestampMilliseconds in some other places; can that be used here?

Additional Information/Context

No response

SDK version used

botocore 1.34.33

Environment details (OS name and version, etc.)

macOS 14.6.1, Python 3.8

tim-finnigan commented 1 month ago

Thanks for reaching out. I can reproduce this — when running your code snippet, my request also does not include the milliseconds. More investigation is needed here regarding the expected behavior for SDKs calling the ListFragments API.

Regarding TimestampMilliseconds — it is only referenced in the Transcribe service API models and has different requirements than the Kinesis Video Media Archive models.

I think we'll want to look further into this issue and review with the team, in particular what you mentioned here:

I've verified that if I hack the serialization logic here then AWS is happy to accept timestamps with millisecond resolution, and to return the appropriate fragment list as a result.

mdickinson commented 1 month ago

@tim-finnigan: Thank you for the quick response!

Regarding TimestampMilliseconds — it is only referenced in the Transcribe service API models and has different requirements than the Kinesis Video Media Archive models.

Thanks; that makes sense. It was a long shot. :-)

In case it helps to add some context: our situation is that

we've already made a list_fragments call, so we know which fragments are available in the stream
we want to retrieve a clip composed from a subsequence of those known fragments, and we need to know exactly which fragments went into the returned clip (most importantly, we need to know the producer start time that applies to the clip, so we need the producer start time for the first fragment in the clip).

If millisecond information were preserved, this would be straightforward - we'd pass the producer timestamp for the first fragment we want in the clip as the StartTimestamp, and the producer timestamp for the last fragment we want in the clip as the EndTimestamp. With the situation as-is, we have to think quite carefully about how to arrange that we get the right fragments (by rounding the StartTimestamp down to the nearest second, the EndTimestamp up to the nearest second, and by being sure that there aren't any fragments that have length smaller than one second, so that neither of those roundings risks accidentally bringing in extra fragments).

FWIW, we've already done that careful thinking and applied the appropriate workarounds in our own code; I'm mostly reporting this upstream to help others avoid this trap (though it would be nice to be able to remove our workarounds at some point in the future, too).

boto / botocore