Netflix / vmaf

Perceptual video quality assessment based on multi-method fusion.
Other
4.34k stars 735 forks source link

added vmaf_cuda_fex_synchronize, fixed cuda fex flush functions #1332

Open MorkTheOrk opened 4 months ago

MorkTheOrk commented 4 months ago

This PR fixes the CUDA feature extractor flush functions and adds the possibility to synchronize the CUDA Feature extractors with the CPU so you can retrieve the VMAF, otherwise the function vmaf_score_at_index would either hang or not return a proper result.

Usage:

vmaf_read_pictures(vmaf_ctx, ref, dis, idx);
vmaf_cuda_fex_synchronize(vmaf_ctx);
vmaf_score_at_index(vmaf_ctx, vmaf_model, vmaf_score, idx);
MorkTheOrk commented 4 months ago

Hey @kylophone added a few bug fixes and a new API function, let me know when you want to change anything.

@gedoensmax FYI

kylophone commented 4 months ago

Does vmaf_cuda_fex_synchronize() need to be part of the public API? What if the call to vmaf_cuda_fex_synchronize() happened automatically inside vmaf_score_at_index()?

If you flush the feature extractor, I think that means vmaf_score_at_index() can only be called after processing and not repeatedly. Is that right?

MorkTheOrk commented 4 months ago

What if the call to vmaf_cuda_fex_synchronize() happened automatically inside vmaf_score_at_index()? We will drop heavily in throughput with a implicit synchronisation, we only want it if the user need the VMAF score for a particular frame.

If you flush the feature extractor, I think that means vmaf_score_at_index() can only be called after processing and not repeatedly. Is that right? The flush for the CUDA is there to make sure that all processing of frames is done before we access the score. Would repeatedly called vmaf_score_at_index cause any issues after flushing?

Does vmaf_cuda_fex_synchronize() need to be part of the public API? Yes, it’s need to be a function that the user can call in their own software.

gedoensmax commented 4 months ago

Why is an implicit synchronization in vmaf_score_at_index a problem for throughput ? The function is not guaranteed to return valid values if FEX is neither closed nor flushed.

A check inside vmaf_score_at_index should check if the score is present. If not we have to flush. If after a flush happened the score is still not present we have an error as in the frame request has not been processed. Or do i misunderstand this ?

Due to the above i am not sure if the flush has to be public facing or even if it should be public facing.

Besides, @kylophone what about multithreading in that case we would also have to check if the score is present, otherwise join threads and see if it is available afterwards right ?

MorkTheOrk commented 4 months ago

@gedoensmax Oh yeah, I didn't read it properly, I thought inside vmaf_read_pictures, sorry about that.

Max is right, the throughput would not be hurt, I just dislike implicit synchronization, I would let the user have control. Imagine getting the score for N - 5, that would mean that you need to sync around 5 frames in flight just to get a score that is in the past.

gedoensmax commented 4 months ago

But how do you guarantee N - 5 being ready ? That speaks for implicit sync which is only done if the dict entry is not present. The way we do sync now is not a sync on a specific picture but rather a full sync of all fex.