Better handle audio and video conversion

MonoS commented 1 year ago

Those are the changes made and the reasons: avi:rgb => avi:jyuv: as the documentations says "This is most similar to the original PlayStation video colors" add zscale: the PSX video are in a particular color space (full range, bt601, centered chroma location), but modern browser, AFAIK, will use the "full HD colorspace" (limited range, bt709, left chroma location), making this conversion at the encoding stage will make sure that browser will display correct colors and levels w=582:h=448: upscale the video and fix AR, i'm not quite sure regarding the AR, i've tested it counting pixel from duckstation on a couple of videos -crf 15: a bit of a quality boost, the videos are quite small anyway, it can be raised to lower quality/size -preset veryslow: quality boost at the expense of time, the video are short and at low resolution/framerate, it will take not much more -level 31: force H264 level to 3.1 to allow better compatibility -ar 44100: with default conversion ffmpeg was downsampling the 18.9KHz sound to 16KHz, also AAC use an internal lowpass filter so it was muffling the sound, this will upsample to 44.1KHz so that it will sound correctly -b:a 192k: a bit more bitrate for the sound, it can probably be lowered to 128k but i prefer a bit more as the original sounds are already quite compressed

Regarding the videos a better job could be made introducing avs/vs in the mix, the original videos suffers quite a bit of blocking and Deblock_QED fix almost all of it with minimal/low loss of quality, also the upscale can be done using better algorithms like nnedi3 or a specifically trained ESRGAN model

MonoS commented 1 year ago

Here an example of video INS17.STR[0]

This is how it is converted right now https://user-images.githubusercontent.com/801011/227200980-572c391a-14a1-4dfc-ba12-e6cde68338fe.mp4

This is with my settings https://user-images.githubusercontent.com/801011/227201022-4127bc12-2878-4e25-aed4-2e090c6ad4ab.mp4

This instead is an experiment i made using ESRGAN and model 4xFSDedither_Manga taken from here https://upscale.wiki/wiki/Model_Database#Dithering (i haven't tried other in reality) https://user-images.githubusercontent.com/801011/227201211-1a572989-6e56-4a4b-b4ee-b0c704ff0fc1.mp4 this is the command line vspipe -c y4m lain.vpy - | ffmpeg -i - -i "INS17.STR[0].avi" -map 0:v:0 -map 1:a:0 -crf 15 -preset veryslow -level 31 -ar 44100 -b:a 192k -bsf:v "h264_metadata=colour_primaries=1:transfer_characteristics=1:matrix_coefficients=1" ESRGAN.mp4 and here the vpy script

src = core.lsmas.LWLibavSource("/lainPSX/MOVIE/INS17.STR[0].avi")
rgb = core.resize.Spline64(src, format=vs.RGBS,\
matrix_in_s="470bg", transfer_in_s="601", range_in_s="full", primaries_in_s="170m", chromaloc_in_s="center",\
                     transfer_s="709",    range_s="full",    primaries_s="709",     chromaloc_s="left")

up = ESRGAN(rgb, device="cuda").\
       load(r'/models/4xFSDedither_Manga.pth').\
       apply(overlap=32).\
       clip.\
       resize.Spline64(582, 448, format=vs.YUV420P8, dither_type="error_diffusion",\
                       transfer_in_s="709", range_in_s="full",\
       matrix_s="709", transfer_s="709",       range_s="limited")

up.set_output()

elliotcraft79 commented 1 year ago

What benefit is there to upscaling the video using ffmpeg over keeping it at the original resolution? I have checked the game and the aspect ratio and resolution of the original video is correct. avi:rgb is used as it provides the most accurate colours. The rest of the PR is good, especially the audio stuff (it's kinda strange that ffmpeg did that).

MonoS commented 1 year ago

I have checked the game and the aspect ratio and resolution of the original video is correct.

I've also checked and found a different AR This is a screen from duckstation on almost fullscreen with integer scaling active, i count 1166896 pixel on the video so an AR of ~1.3, different from the one of the video extracted that is 320240 so 1.(3) How did you check the AR? Maybe it's the emulator or my configuration that is wrong?

avi:rgb is used as it provides the most accurate colours

This is a bit difficult to explain and i may be wrong, so take it with a grain of salt, personally i will talk with the maintainer of jpsxdec on that, but this is my opinion. While the rgb mode will have the most accurate colors, for this project you are not using the original clip but a YUV420 reconversion, to do that ffmpeg need to change the colorspace from RGB to YUV, but to obtain the RGB clip jpsxdec had to convert the original data to RGB and then quantize it to 8bit losing some of the original information in the process, then you convert it again to YUV using the quantized information from the RGB so doing the conversion two times, also you are upscaling the chroma for the initial conversion and then downscaling it again when converting to YUV. Saving it in JYUV instead should not incur in the initial colorpsace conversion as we have the "most similar to the original PlayStation video colors" and so we can convert it to the final format just once. What i don't understand is how those option are worded in the documentation, for instance we have this for the RGB format

Therefore, if you really want the pixels to have the most accurate colors, use this method.

but for JYUV we have

This is most similar to the original PlayStation video colors.

what's the difference between "most accurate" and "most similar"? Probably it's because no player can handle correctly the JYUV video? I don't know. On a final note my intention was to apply some additional filter in the chain, not necessary an AI Upscale like ESRGAN.

What benefit is there to upscaling the video using ffmpeg over keeping it at the original resolution?

If the AR is correct, very little, we should check which upscaling algo is used by the browser but i guess they use a simple bilinear filtering, ffmpeg instead can use a lanczos filter which in theory is better.

especially the audio stuff (it's kinda strange that ffmpeg did that).

I think that ffmpeg is working with the standard, AAC doesn't support 18.9KHz sample, only 22.5KHz and 16KHz, 16KHz is closer so it's rounding down i guess, i'm not sure about the cutoff, probably is not applied in that case

elliotcraft79 commented 1 year ago

The game itself initialises the screen at 320x240, so the aspect ratio is definitely correct (duckstation uses 319x224 instead of 320x240 by default as it crops overscan). Chromium (and probably firefox) does indeed appear to use bilinear scaling, but I don't think it would matter if we left them as is (especially with the already poor quality the videos are in). As for avi:rgb vs avi:jyuv, it would probably be better to ask the dev of jpsxdec as you said.

MonoS commented 1 year ago

The game itself initialises the screen at 320x240, so the aspect ratio is definitely correct

I'll ask the devs of jpsxdec for that also, as the preview windows is not 320*240, but i'll amend my PR to not change the resolution

but I don't think it would matter if we left them as is (especially with the already poor quality the videos are in).

I think the same, my upscale was used only to fix the AR

As for avi:rgb vs avi:jyuv

Let me know, personally i'll leave jyuv, but if you intend to accept this PR in the near future and prefer rgb i can revisit the colorpsace conversion to start with an rgb input

ad044 / LainTSX

Better handle audio and video conversion #32