Use VideoTimestamps - Githubissues

I realized that the ms_to_frame function is not perfectly accurate for chapters.

VideoTimestamps performs all its calculations in milliseconds. That's perfect for SRT, WebVTT, ASS, and any format that has a maximum precision in milliseconds. However, for any format that exceeds millisecond precision, it won't be perfect when the file is not an MKV.

It will work well with MKV files because their precision is (99% of the time) in milliseconds. So, if we have a time in nanoseconds, there is a way to properly handle it in milliseconds.

But for non-MKV files (e.g., M2TS), the precision can be "infinite," making it impossible to properly convert a time, let's say in nanoseconds, to milliseconds.

However, I talked with cubicibo (who has the Blu-ray spec), and he told me that a chapter needs to be exactly the frame time (a.k.a. TimeType.EXACT). In this case, we can safely assume that if we truncate the time to milliseconds, it will still be displayed on the same frame.

As for subtitles, they should technically never exceed millisecond precision since ASS, WebVTT, SRT, etc., cannot represent those values. Therefore, if we truncate the timedelta, we don't lose any information.

If you ever need a timedelta_to_ms function, here is how it would look:

def timedelta_to_ms(
    time: timedelta, time_type: TimeType, rounding_method: RoundingMethod = RoundingMethod.ROUND
) -> int:
    """
    Converts a timedelta to milliseconds without losing information.

    :param time:                The timedelta.
    :param time_type:           The time type.
    :param rounding_method:     If you want to be compatible with mkv, use RoundingMethod.ROUND else RoundingMethod.FLOOR.
                                For more information, see the documentation of [timestamps](https://github.com/moi15moi/VideoTimestamps/blob/683a8b48ad394d60ced0deda0ddb87b70e0bfa83/video_timestamps/timestamps.py#L14-L29)
    :return:                    The resulting frame number.
    """

    if rounding_method == RoundingMethod.ROUND:
        time_ms = time.total_seconds() * 1000  # total_seconds returns a float, which can have imprecision.

        if time_type in (TimeType.START, TimeType.END):
            ms = ceil(time_ms)
        elif time_type == TimeType.EXACT:
            ms = floor(time_ms)
        else:
            raise ValueError(f"The TimeType {time_type} isn't supported.")
    elif rounding_method == RoundingMethod.FLOOR:
        # It is impossible to ensure precision for the FLOOR method because it is impossible
        # to convert, for example, nanoseconds to milliseconds without losing information.
        # It works well for the ROUND method because the maximum precision of video timestamps
        # is in milliseconds, so we don't lose any information in the conversion process.
        # Note: For chapters, the specification says that it should always be TimeType.EXACT, 
        # so we can simply floor it.
        raise ValueError("It is impossible to ensure precision for the FLOOR method.")
    else:
        raise ValueError(f"The RoundingMethod {rounding_method} isn't supported.")

    return ms

Edit (08-18-2024): Since we cannot always convert a timedelta to milliseconds, you could think that VideoTimestamps should simply use a higher precision (ex: instead of milliseconds, use nanoseconds), but actually, it would only make things worse.

In the BD spec (thanks again cubicibo), for a $fps= {24000 \over 1001}$, $N{ticks} = 3753.75$. But, actually, it will trunc $N{ticks}$ to create the PTS.

Exemple:

\begin{gather}
\text{Frame 0 PTS} :  \lfloor 0 \times 3753.75 \rfloor = 0 \\
\text{Frame 1 PTS} :  \lfloor 1 \times 3753.75 \rfloor = 3753 \\
\text{Frame 2 PTS} :  \lfloor 2 \times 3753.75 \rfloor = 7507 \\
\text{Frame 3 PTS} :  \lfloor 3 \times 3753.75 \rfloor = 11261 \\
\end{gather}

In nanoseconds, it gives:

\begin{gather}
\text{Frame 0 Time} :  \lfloor 0 \times 3753.75 \rfloor \times {1\over 90000} \times 10^9 = 0 \text{ns} \\
\text{Frame 1 Time} :  \lfloor 1 \times 3753.75 \rfloor \times {1\over 90000} \times 10^9 = 41700000 \text{ns} \\
\text{Frame 2 Time} :  \lfloor 2 \times 3753.75 \rfloor \times {1\over 90000} \times 10^9 = 83411111.\overline{1} \text{ns} \\
\text{Frame 3 Time} :  \lfloor 3 \times 3753.75 \rfloor \times {1\over 90000} \times 10^9 = 125122222.\overline{2} \text{ns} \\
\end{gather}

If we compare it with the equation we use to approximate the time with an $fps$, we can clearly see that we lose a lot of precision:

\begin{gather}
\text{Frame 0 Time} :  0 \times {1 \over {24000 \over 1001}} \times 10^9 = 0 \text{ns} \\
\text{Frame 1 Time} :  1 \times {1 \over {24000 \over 1001}} \times 10^9 = 41708333.\overline{3} \text{ns} \\
\text{Frame 2 Time} :  2 \times {1 \over {24000 \over 1001}} \times 10^9 = 83416666.\overline{6} \text{ns} \\
\text{Frame 3 Time} :  3 \times {1 \over {24000 \over 1001}} \times 10^9 = 125125000 \text{ns} \\
\end{gather}

Jaded-Encoding-Thaumaturgy / muxtools

Use VideoTimestamps #21