Open AdithyaRaman opened 1 week ago
To add a follow-up, I did make some modifications to perform bitrate-ladder calculations for every segment (Segment length=1 second) of the BigBuckBunny video. The bitrate in the ladder for the highest quality level does not seem to be consistent with the spatio-temporal complexity of the segment (Refer to below image). Is it possible that the different models (vmaf-pred, bitrate-pred and crf-pred) referenced in this repository might not be the latest versions making their respective predictions with some inaccuracy and thus affecting the entire bitrate ladder itself?
The original paper "JND-Aware Two pass Per-Title Encoding Scheme for Adaptive Live Streaming" describes JTPS bitrate ladder prediction applied for every segment of a live video streaming session.
But in this repository, the mean of the spatial and temporal complexity values are taken and the bitrate-resolution-crf ladder is predicted for the entire video session.
To align this repository better with the original paper, shouldn't the triplet prediction be applied for every segment. This also means we have to define what the segment length would be (1 frame or 500-2000 ms)