Custom interface for rendering subtitles on an RGBA texture(s) | (XySubFilter for madVR) [Part 1]

dannphou / xy-vsfilter

Automatically exported from code.google.com/p/xy-vsfilter

0 stars 0 forks source link

Custom interface for rendering subtitles on an RGBA texture(s) | (XySubFilter for madVR) [Part 1] #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

This is a kind of feature request. Fot the info you better see the link beliw, 
since I don't feel like past all conversation here.
http://forum.doom9.org/showthread.php?p=1535165#post1535165

Original issue reported on code.google.com by yakits...@gmail.com on 30 Oct 2011 at 6:08

Merged into: #91

GoogleCodeExporter commented 9 years ago

Have read the thread and the idea, xy-VSFilter interaction with madVR, is 
great, I'm really looking forward to it. But I'll need sometime for 
investigation.

Original comment by YuZhuoHu...@gmail.com on 31 Oct 2011 at 3:21

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

First you'll need to wait on madshi (madVR developer) to see if he ultimately 
decides he wants to modify madVR for VSFilter or not. He seems a bit torn 
between using VSFilter to suit his needs (requiring modifications of both 
VSFilter and madVR), or going to a major coding effort attempting to port 
Libass for use in madVR.

Also see:
http://www.cccp-project.net/forums/index.php?topic=5776.msg37106#msg37106

madVR adds dummy output + input pin to for VSFilter.
madVR sends blank frame RGB32 + Alpha to VSFilter.
VSFilter renders subtitles on the Alpha channel of the RGB32 frame.
VSFilter sends the frame with rendered subtitles to madVR.
madVR blends the blank frame with subtitles into the video frame with the GPU.

That was madshi's initial idea when he brought it up to gommorah 
(threaded-vsfilter developer), and of course subject to change based on 
feasibility to implement. How do you think would make most sense to render 
subtitles at an arbitrary resolution/framerate specified by madVR and then 
output an subtitles to madVR for GPU blending into the actual video frame? You 
may want to look into how the CSRI interface uses TextSub (VSFilter Avisynth 
function) to render subtitles and see if you can re-purpose them into a new 
custom interface for something like this.

Are you able to make a forum account and thread on Doom9 
http://forum.doom9.org/ so people can discuss and provide feedback on 
xy-VSFilter?

Original comment by cyber.sp...@gmail.com on 1 Nov 2011 at 12:59

GoogleCodeExporter commented 9 years ago

Currently, the idea
  madVR adds dummy output => VSFilter => madVR,
will be easier for me since I don't have to do much work. 

I can access forum.doom9.org, lucky!

Original comment by YuZhuoHu...@gmail.com on 1 Nov 2011 at 3:49

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

The idea I had with the dummy output pin -> VSFilter -> madVR would probably 
work ok, but it would require some ugly DirectShow graph rerouting logic, 
eventually even needing specific Media Player support (not sure). Also 
currently VSFilter does not fill the alpha channel of the rendered video frame. 
So if madVR sends a "blank" video frame to VSFilter to render on, I'd have no 
idea how to blend each pixel. So I'd have to ask VSFilter to render each frame 
twice, once with a white and once with a black video frame. Then I could 
probably calculate the alpha values used by VSFilter. As you can imagine this 
would require a lot of work on my side and might potentially cost a lot of CPU 
performance.

It would be much nicer if xy-VSFilter could get special support for madVR. I'm 
thinking of something like this:

(1) VSFilterForMadVR would have only one input pin for subtitle data and no 
output pin.
(2) VSFilterForMadVR would then render the subtitles on a 32bit RGBA bitmap, 
with the alpha channel properly filled.
(3) VSFilterForMadVR would then send that RGBA bitmap to madVR via a private 
communication channel.

That would be relatively easy for me to implement. Not sure how difficult it 
would be for you. Some things to think about:

(a) private communication might have to include madVR telling VSFilter some 
things about the video (video rectangle etc) because VSFilter doesn't have the 
information from the video input pin, anymore.
(b) Not sure how many frames per second VSFilter would create when it doesn't 
get the "timing clock" through the video frame input pin. Maybe it would make 
sense for madVR to callback VSFilter to ask for an RGBA bitmap for every 
rendered video frame instead of VSFilter doing the rendering with a clock of 
its own?
(c) Instead of using a private communication channel, VSFilter could also offer 
an RGBA output pin. But then problem (b) remains. Furthermore I'd have to add 
support for madVR to accept a secondary video input pin with RGBA for blending. 
I'm planning to do that at some point in the future, anyway, but it'd be a lot 
of work, so it won't come soon. The easier and quicker way for me would be to 
have a private communication channel.

I'm open for any other alternative ideas or any wishes you might have for 
custom madVR interfaces.

Thoughts?

P.S: One small improvement you could make which would make my original idea 
easier to implement would be if VSFilter would make sure that if it gets an 
RGBA video frame to render on, that the alpha channel is properly filled with 
the subtitle alpha. That way madVR wouldn't have to do weird tricks to figure 
out how to blend the subtitles.

Original comment by mad...@gmail.com on 2 Nov 2011 at 10:59

GoogleCodeExporter commented 9 years ago

First
Quote:Maybe it would make sense for madVR to callback VSFilter to ask for an 
RGBA bitmap for every rendered video frame instead of VSFilter doing the 
rendering with a clock of its own?
Agree.

I think the best option would be VSFilterForMadVR: having only one input pin, 
no output pin, and providing a callback for madVR. Although it is not the 
easiest for me.

Original comment by YuZhuoHu...@gmail.com on 2 Nov 2011 at 12:31

GoogleCodeExporter commented 9 years ago

Re Comment 4:
I guess maybe you want to ask for an sequence of bitmaps instead of one bitmap 
for every rendered video frame (via a callback) for every rendered video frame. 
The sequence will looks like this:
  {(dst_rect_1, argb_data_1),(dst_rect_2, argb_data_2),(dst_rect_3, argb_data_3),...,(dst_rect_n, argb_data_n)} (the sequence may go very long but a upper bound can be set)
And the bitmaps will be alphablended onto the video frame one by one.
I'm not sure if that makes a big trouble for you do it on GPU, but the benefit 
is madVR can get support from other subtitle renderers easily via the same 
callback. E.g.libass, it outputs a similar format.

For Vsfilter, it has to do the above alphablending on CPU to generate a final 
subpic. That's one of the reasons for its slowness. Using such a interface, 
then I am possible to distribute some (or all) alphablending jobs to madVR 
while leave some (or no) alphablending jobs on CPU according to the script.

Original comment by YuZhuoHu...@gmail.com on 2 Nov 2011 at 4:01

GoogleCodeExporter commented 9 years ago

That sequence of bitmaps, is that the internal format VSFilter is working with? 
I could support it, I guess, but I'm not sure if it's the best solution to 
upload all those bitmaps separately to the GPU, from a performance point of 
view. How many bitmaps does a sequence typically have for one video frame? It's 
not one bitmap per character, is it?

Original comment by mad...@gmail.com on 3 Nov 2011 at 6:00

GoogleCodeExporter commented 9 years ago

Original comment by cyber.sp...@gmail.com on 10 Nov 2011 at 11:35

Changed title: Custom interface for rendering subtitles on an ARGB texture (VSFilterForMadVR)

GoogleCodeExporter commented 9 years ago

You mean fr every change in the script vsfilter generates the whole new bitmap? 
Sounds scary. But I don''t believe it creates it for every character. Probably 
it creates one bitmap per one element. So if you have one sentence - its a one 
bitmap, but if you have custom rotation applied to evety character in that 
sentence or every character has different color - these will be different 
bitmaps. Is that so?

<< Attention, below is a pure guesswork. >>
So *If* answer to my question above is *yes*, I think it won't be too slow for 
madvr. It will be around 5 bitmaps per sequence? It should be possible to limit 
bitmaps uploading when heavily styled parts are detected such as the 
openings/endings or insert songs, where we can have hundreds of bitmaps =)) And 
about blending these bitmaps in GPU, that's should be blazing fast.

Original comment by yakits...@gmail.com on 16 Nov 2011 at 12:50

GoogleCodeExporter commented 9 years ago

==================================================
Really sorry for my sudden break on this discussion last time. Though I cannot 
truly return to develop this project yet, I'll continue this discussion.
==================================================
==============
Quote:
    madshi:
    That sequence of bitmaps, is that the internal format VSFilter is working with? 
==============    

Not exactly. The internal format VSFilter working with is more like a sequence 
of single color bitmaps with alpha mask: 
    {(dst_rect_1, single_rgb_1, alpha_channel_data_1),(dst_rect_2, single_rgb_2, alpha_channel_data_2),(dst_rect_3, single_rgb_3, alpha_channel_data_3),...,(dst_rect_n, single_rgb_n, alpha_channel_data_n)}

==============
Quote:
    madshi:
    How many bitmaps does a sequence typically have for one video frame? It's not one bitmap per character, is it?

Quote:
    yakitsume:
    Probably it creates one bitmap per one element. So if you have one sentence - its a one bitmap, but if you have custom rotation applied to evety character in that sentence or every character has different color - these will be different bitmaps. Is that so?
==============

For most simple scenes, there will be 3 single color bitmaps for one video 
frame: one shadow bitmap, one outline bitmap and one body bitmap. But in the 
worst case, e.g. complex openning/ending script that every character/word has 
different style, there will be upto 3 bitmaps per character/word. 

==============
Quote:
    yakitsume:
    You mean fr every change in the script vsfilter generates the whole new bitmap? Sounds scary. 
==============

Yes, to a certain degree. Not all things need to be re-generated with a decent 
cache system.
To create a subpic, VSFilter do 3 jobs:
    1) parse the corresponding script.
    2) create a sequence of single color bitmaps.
    3) alpha blend the above sequence to create a subpic.
Currently, in xy-VSFilter, step 1) and 2) have cache support, so they don't 
need to be totally re-executed while animating. But step 3) not yet have any 
cache support. All alpha blend operations will be redone to create a new subpic 
(no matter how small difference it is between the new subpic and subpic 
previous created).

==================================================
I've assumed:
    1) It's not efficient to upload a lot small bitmaps to GPU. (So that the length of that bitmap sequence should be limited.)
    2) Alpha blend operation has associativity, i.e. if we denode alpha blend bitmap sub1 on to bitmap frame1 as
           frame1 AB sub1,
       then
           (frame1 AB sub1) AB sub2 = frame1 AB (sub1 AB sub2).
I felt that it can be done (efficently) only if I use pre-multiplied alpha. (So 
that the length of that bitmap sequence can be limited, because small items can 
be combined on CPU first.)

==================================================
More details on the interface:
    MadVR doesn't have to ask for subpic data for every rendered video frame. A begin time and a end time can be packed together with the subpic data. They denode for when to when the subpic should be showed. During that time span, madVR can reuse the subpic data. For most simple scenes, the time span will have several secends. For animated scenes, the end time will be set to begin time + 1. It means that madVR have to ask for new subpic data next video frame. There's a problem though: that supbic data may be invalidated, e.g. user modifies the script file during playback, or new segment data subtitle pin send causes an invalidation. To solve it, an extra callback like IsSubpicInvalidated will be needed. It takes a current video frame reference time and a subpic created time as input, and simply return true/false telling the caller whether subpic created at that created time for that reference time has been invalidated.
    In conclusion, a begin time, a end time and a created time can be packed together with the subpic data. MadVR keeps a record of subpic data internally. For every video frame, madVR first check if it can reuse the subpic data (if the video frame reference time fall between the begin time and the end time of the subpic data, and if the subpic data has been invalidated). If not then it ask for new subpic data.
==================================================

Original comment by YuZhuoHu...@gmail.com on 16 Nov 2011 at 7:20

GoogleCodeExporter commented 9 years ago

Ok, I can see we need to decide on 2 things:

(A) Who should do the alpha blending? xy-vsfilter = CPU? Or madVR = GPU?
(B) Should ask madVR for a specific frame? Or should xy-vsfilter deliver a 
frame with a start/stop time?

Let's me discuss (A) first: I'm not sure myself whether letting the GPU do all 
the alpha blending work would cost much GPU performance. Maybe not. It might 
not be much of a problem at all. So my thinking is that we should design an 
interface which allows xy-vsfilter to send a series of RGBA bitmaps to madVR 
together with coordinates, and let madVR do all the alpha blending work. In the 
end this interface would still allow us to go both ways. You could still do the 
alpha blending in xy-vsfilter and send only one big subpic. The interface would 
not stop you from doing that. You could even add an option to xy-vsfilter to 
let the user decide where to do the alpha blending work (CPU/GPU). If we find 
out that the GPU performance does suffer, users with a weak GPU but strong CPU 
might prefer to let xy-filter to the work. While users with a strong GPU but 
weaker CPU would probably prefer to let madVR do the work. Or maybe we find out 
that it's not a problem at all even for the weakest GPUs, then we don't even 
need an option.

Now about (B): I'm wondering whether your idea would work well. I'm not 
intimately familiar with all the various ASS commands, but there's probably 
some kind of "fade out" command, I would guess? Let's say in a movie the ASS 
script asks you to fade out the subtitle over the time of 1 second. With a 
24fps movie, this would result in you creating 24 different subtitle pictures 
(identical bitmap data, but changing alpha channel). With a 60fps movie, you 
would even create 60fps. So basically I guess that with some of the 
animation/fade effects, the number of different subpics probably depends 
directly on the movie frame rate, doesn't it? Because of that I think it might 
be better if madVR asks for the subpic for a specific video frame instead of 
the other way round. *However*, I would strongly suggest that we allow 
xy-filter to allow saying "reuse last frame's subpic" instead of sending new 
data. That should save quite a bit of performance. What do you think?

There's a new thing (C) we need to discuss, too:

Currently due to the way vsfilter works, the subtitle are usually drawn on the 
video before aspect ratio correction and before upscalig. With SD content that 
means that subtitles have to be upscaled quite a lot, making them rather blurry 
looking. Using a private communication channel between xy-filter and madVR 
would allow us to render the subtitles in the final output resolution, which 
should improve subtitle rendering quality *A LOT* for SD content. I've been 
told that this might be problematic in terms of aspect ratio and 3D rotations 
etc. But I think this should all be solvable somehow if we take it into account 
from right from the start. What do you think?

Finally, maybe we should ask JanWillems to join this discussion? He's the one 
working on the MPC-HC renderers. He might be interested in this, as well.

Original comment by mad...@gmail.com on 18 Nov 2011 at 9:19

GoogleCodeExporter commented 9 years ago

I'll fully agree with all the suggestion.

==========
Quote:
I've been told that this might be problematic in terms of aspect ratio and 3D 
rotations etc.
==========

I don't see any possible issue on my side, as long as all I have to do is 
generate subpic data at specified size for specific video frame. But if there 
is any actually, please let me know.

I'd like to have JanWillems join this discussion too if he is interested in it.

Original comment by YuZhuoHu...@gmail.com on 18 Nov 2011 at 1:09

GoogleCodeExporter commented 9 years ago

I've just been notified of this thread. I'll happily share my thoughts about 
subtitle rendering. Let me introduce myself first.
I'm JanWillem32, currently involved with developing the internal set of 
renderers for the MPC-HC project. I've been involved with fonts/vector 
graphics, artwork, texturing, modeling, coding and stuff for 3D game and 
raytracing for years now as a hobby. About two years ago, I specialized in 
all-round DirectX 10 rendering. Last year December, I was "recruited" for the 
MPC-HC project, after I complained about messed-up rendering stages for the 
internal renderer. After that, I've also been working on the subtitle renderer 
parts. That has lead me here, it seems.

I'm indeed very interested in making changes to the original or making a 
new(ish) subtitle renderer. I'm very dissatisfied with the current one. The 
code is really old, so many parts are outdated. I'm actually more dissatisfied 
myself with the rendering quality and texturing techniques than with 
performance, which seems to be the main deal with this project.

From the early days of rendering techniques, many font and image renderers used 
about the method as "we" use now for the subtitle renderer: ready an empty 
bitmap to paint on, fill it line-by-line by a renderer or copy from another 
texture, ready the GDI or DirectDraw and make it paint the stuff on screen.
A trivial matter was to get the video adapter to blend such things and present 
that on screen. Even a decade later, when done right, alpha blending 
techniques, from complex spatial additive alpha blending with lighting 
techniques on partical sprites, to the usual blending of icons on the Windows 
desktop with Aero enabled, are really light on the GPU. (I hope that answers 
the question as to what device should add the subtitles to the video render 
product.)
The next item was the amount of textures, I believe? For the DirectX 8.1 
project it was decided that it would probably be best to limit the amount of 
textures in the pool to no more than 250 at a time, and put the smaller ones 
together on 256×256 pixel textures (as far as I can remember from back then). 
For modern rendering, we just make textures and use them. Paying attention to 
the amount of available video and system memory is important, but we can simply 
cache less ahead of time when memory runs low.
Then we come to the point of texture management. In rendering we often re-use 
the same texture over and over again (transforming it a bit for each target a 
bit is easy, too). Leaving instancing techniques aside for now, textures are 
loaded with reference counting, associated per object to be rendered (in the 
near future, as we pre-cache data). For subtitles, a similar method is easy to 
implement;
subpic 1: {ABC}, the textures A, B and C hold 1 reference and the subpic object 
is sent to the video renderer to be drawn on screen when called
subpic 2: { BCD}, the textures B and C get Addref() called on them, their 
reference count is now 2, D is new, so it holds one reference and the subpic 
object is sent to the video renderer to be drawn on screen when called
... and so on...
When a subpic invalidates, meaning that the end time for it is below the 
current timer for the video renderer video, a check is made to make sure the 
next subpic is ready or at least had the chance to call Addref() on textures, 
and then call Release() on all textures of the subpic to be invalidated. In the 
case of subpic 1, that would change the reference count of texture A to 0 
(deleting it), B and C will keep one reference by subpic 2 and are preserved.
This system would spare the PCI bus from having to constantly transfer huge 
textures to video memory. As a good start, updating the library for the text 
interpreters would be a good thing. These currently signal the subtitle 
renderer that the content is constantly animated, lowering the lifespan of 
text-based subtitles to 100 ns each (ISubPic, IsAnimated()).

Before I start on ranting about  how very much I dislike image renderers with 
not-even-a-lowly-8-bit-integer color precision, no gamma and colorspace 
correctness, and integer coordinate systems, I'd like to know if the people 
over here are on the same wavelength as me. The only way I'm going to invest 
time in the subtitle renderer, is when no compatibility is required with the 
older versions, fundamental changes are made to the rendering techniques and 
the code is kept under GPL. I'm not interested in merely rendering speed gains, 
I've already refused to work with some previous "developers" for that reason.
I'll happily share the little bit of edited code I already have, defend my 
arguments about why things should change and discuss things further here. We 
may even get to the point of trying out the DirectX 11 font/vector renderer in 
due time.

I've already changed the workings of the subtitle renderer to pass textures to 
blend to the internal video renderer I'm working on in my tester builds, it 
should be a trivial matter to copy the same code to the MadVR 
allocator-presenter. (Using AlphaBlt() to do it, is just messy.) I can offer 
that much for now, at least.

Original comment by janwille...@hotmail.com on 19 Nov 2011 at 12:40

GoogleCodeExporter commented 9 years ago

"The only way I'm going to invest time in the subtitle renderer, is when no 
compatibility is required with the older versions, fundamental changes are made 
to the rendering techniques and the code is kept under GPL. I'm not interested 
in merely rendering speed gains, I've already refused to work with some 
previous "developers" for that reason."

@JanWillem32

Can you make clear what you consider 'compatibility with older versions'? Are 
you talking about VSFilter.dll, the subtitle rendering in MPC-HC, or both? We 
have no desire to maintain any backwards compatibility with the MPC-HC internal 
subtitle render if that's what you are referring to. The plan was to start from 
scratch and code an entirely new/improved/higher-quality subtitle interface 
between VSFilter.dll and video renderers which choose to support it. The goal 
was to implement this new interface within VSFilter.dll as an external filter, 
which is a bit different than the work you've been doing in MPC-HC. Is this a 
problem?

VSFilter.dll on the other hand, there is a limit to how much you can change how 
ssa/ass scripts are rendered without breaking millions of existing subtitle 
scripts. Bitmap based subtitles like VOB and PGS can be improved and changed to 
your hearts content. Similar to Libass, quality improvements are welcome, but 
significant behavior changes from how VSFilter should either be avoided or 
VSFilter compatibility toggles added whenever possible.

The coder of xy-VSFilter (I'm not a coder, only assisting with support & 
management for xy-VSFilter) would only be focusing on implementing things under 
the control of xy-VSFilter. It would be you and madshi doing all coding related 
to handling of the subtitles after they are passed off to your video renderers. 
How and in what way xy-VSFilter interacts with the video renderers would be a 
joint collaboration effort. To a certain extent you would have free reign to do 
whatever you want after subtitles are out of the hands of xy-VSFilter, though 
it would be nice to have standardized behavior.

I'm just throwing this out there before the main coder replies, to clear up any 
confusion. Up to this point the xy-VSFIlter project has only involved revamping 
the external VSFilter dll, while the work you've been doing has only involved 
revamping the MPC-HC internal subtitle and video renderers. What form this new 
implementation takes (extend xy-VSFilter, replace the MPC-HC internal subtitle 
renderer, or both) is still up for debate, but up to this point the coder of 
xy-VSFilter has completely ignored the MPC-HC implementation.

Last but not least, it will still likely be a few months before any work on 
this gets underway. First things first, the rest of the planned features need 
to be implemented, xy-VSFilter (which is based on VSFilter 2.39) brought up to 
date with any important changes in 2.40, fix all the known issues, and then 
release a stable version.

Original comment by cyber.sp...@gmail.com on 19 Nov 2011 at 6:18

GoogleCodeExporter commented 9 years ago

Thanks for such an elaborate answer.
A new and improved set of interfaces from a subtitle renderer is indeed what 
I'm looking for. The current interfaces and outputs don't properly serve any 
decent video renderer at all. I mostly wanted to point out that I'm not willing 
to work on a subtitle renderer that can paint on the raw video surfaces 
directly.
I'm all for staying true to the ssa/ass script standards. (Although I would 
love to see improvements for the specifications, but that's not my job.)
I'm annoyed by the many basic flaws in the subtitle renderer.
The bitmap based subtitles are indeed a nice example of that. These are 
converted from their native storage formats, to what is assumed to be the same 
as the output RGB type of the video. The conversion is typically done in a 
rather inefficient manner, using either a R4G4B4A4 or R8G8B8A8 DirectX texture 
at the complete screen size to paint on. Using a DirectX texture type closer to 
the original storage format and of the same size as in the original subtitle 
stream would save a lot of processing and memory copying. During the texture 
blend operation on the GPU, the original alpha and color values are transformed 
on the shadercore anyway, adding two or tree more dot operations to to perform 
color conversion to match the destination format is perfectly fine there. (A 
dot operation takes only one assembly instruction on the GPU. A CPU can 
actually do that too, using the SSE4 DPPS instruction, although without using 
arbitrary swizzling of registers, which is often employed on the GPU's 
shadercore.) The subtitle renderer only needs to specify what format the 
incoming texture is in. Setting a pixel shader to convert the color format of 
the buffer and then alpha blend that to the render target is easy (+possibly 
resizing the texture during that process).
At least the bitmap based subpictures are flagged as never animated, and are 
retained for several frames.
I'll nag about the flaws in the text renderer some other time...

At least it seems we'll be able to work together on a revamped subtitle 
renderer. If it will serve a proper quality output to video renderers, have 
proper extensibility for additional libraries and documentation for future work 
on it, I'll gladly help. I don't mind if it takes a while to start, nor to 
finish. I would just like things to change for the better.

Original comment by janwille...@hotmail.com on 19 Nov 2011 at 10:18

GoogleCodeExporter commented 9 years ago

stop hacking on vsfilter, jesus christ
you need to write something new from scratch, the vsfilter codebase is beyond 
all hope of salvation

Furthermore, if you actually want people to use a new subtitle renderer, thou 
shalt not break existing typesetting. You must parse and render ASS exactly the 
same as VSFilter does, almost down to a bug-for-bug compatibility level.

Original comment by kalle.bl...@gmail.com on 19 Nov 2011 at 3:04

GoogleCodeExporter commented 9 years ago

"Furthermore, if you actually want people to use a new subtitle renderer, thou 
shalt not break existing typesetting. You must parse and render ASS exactly the 
same as VSFilter does, almost down to a bug-for-bug compatibility level."

TheFluff, there is no new subtitle renderer. This is the same VSFilter 2.39.x 
you've always known, just faster with a few extra features tacked on. The goal 
of xy-VSFilter thus far has always been bug-for-bug compatibility with zero 
regressions from VSFilter 2.39 and I don't see that changing.

The goal of this Issue #40 is not to create an entirely new subtitle renderer, 
but to create a replacement for the horrible ISubRender interface which has 
been known to break typesetting because it's NOT bug-to-bug compatible with 
VSFitler. The MPC-HC internal subtitle renderer needs to die, and implementing 
a new better interface between VSFilter and video renderers is a strong step 
forward towards that goal.

That said, I'd really like to see someone revive Kumaji and finalize the AS5 
spec so VSFilter itself can die. Since there appears to be little interest in 
either, we have to make do with what we have. Unfortunately, fansubbers have 
become more daring with soft-subbing over the past few years, and we've since 
seen an increased frequency of karaoke and typesetting which at times doesn't 
even play in real-time on an overclocked Core i7. The need for a faster 
VSFilter with bug-to-bug compatibility with VSFilter 2.39 is why xy-VSFilter 
was born.

Original comment by cyber.sp...@gmail.com on 19 Nov 2011 at 4:47

GoogleCodeExporter commented 9 years ago

TheFluff: "you need to write something new from scratch, the vsfilter codebase 
is beyond all hope of salvation"

Of course we need, but here is the problem - there is no one who can do this. 
Kumaji was a great idea but its too far for reality. So how many years we 
should wait till there will be someone who can bring brand new renderer? 5? 10? 
While we waiting for this to happen it is good idea to improve what we have now 
instead of making yourselves suffer 10 more years.

Original comment by yakits...@gmail.com on 19 Nov 2011 at 9:44

GoogleCodeExporter commented 9 years ago

@Jan, there are 2 totally separate issues:

(1) Working on a subtitle renderer.
(2) Creating a new interface through which *any* subtitle renderer (e.g. 
xy-vsfilter) and *any* video renderer (e.g. madVR or EVR-Custom) can exchange 
data and information.

We're currently talking about (2). The original purpose of the new interface 
was to allow "xy-vsfilter" and "madVR" to talk to each other through a private 
interface. Something similar to "ISubRender" (which you probably know), but we 
want something better than ISubRender. But then we thought it would make sense 
to invite you, too, since you're the "MPC-HC EVR guy". We're hoping you would 
be willing to participate in creating the new interface. And that you will add 
support for it to the MPC-HC VMR/EVR renderers.

It seems that you understood our invitation to join here as a suggestion to 
join the xy-vsfilter project as a developer. That was not our original 
intention, but now that you mention that, I think YuZhuoHuang probably would be 
glad to get help! Of course I can't speak for him, though. He's already done 
both performance and quality tweaks. E.g. he added P010 (10bit) input/output 
support. I'm not sure what his final target is for xy-vsfilter. Maybe you are 
aiming higher than he is? Don't know, maybe the two of you should discuss this 
via email or chat or something.

For now I hope you'll join the discussion about how a new "subtitle renderer 
<-> video renderer" interface should ideally look like. Let me sum up a few 
things, and then you can check whether you agree or disagree:

(a) subtitles should be transported as RGBA bitmaps, no D3D involved
(b) subtitles should not be rendered/blended onto the video images by the 
subtitle renderer, that's the video renderer's job
(c) the video renderer should ask the subtitle renderer for one big RGBA bitmap 
(or for a series of smaller RGBA bitmaps) for every video frame; the subtitle 
renderer should reply with one big (or a series of smaller) RGBA bitmap(s); the 
subtitle renderer can also reply with "same as last frame" to save 
resources/performance
(d) the video renderer can request the subtitles to be rendered directly to the 
target resolution (after upscaling) to improve quality

If implemented this way, the xy-vsfilter DirectShow filter would have exactly 
one input pin for subtitle data. No further input/output pins. And it would 
automatically work with DXVA etc with all video renderers which support the new 
interface.

Any comments?

Original comment by mad...@gmail.com on 19 Nov 2011 at 10:19

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

@Jan I have few knowledge on D3D. It's a bit hard for me to fully understand 
all the technic points you're talking about. But Madshi has summed up things 
perfectly. We can discuss things other than  "subtitle renderer <-> video 
renderer" interface via email or open another thread/issue.

For the interface, the 4 points Madshi listed are all I have agreed. Now seeing 
Jan's comments, I have a few questions:
"(a) subtitles should be transported as RGBA bitmaps, no D3D involved"
Should the interface leave an opportunity for subtitle renderer to involve D3D? 
Or will that only bring unnecessary complexity for both sides? 

"(c) the video renderer should ask the subtitle renderer for one big RGBA 
bitmap (or for a series of smaller RGBA bitmaps) for every video frame; the 
subtitle renderer should reply with one big (or a series of smaller) RGBA 
bitmap(s); the subtitle renderer can also reply with "same as last frame" to 
save resources/performance"
We can do more (?). When some of the subtitle renderer's reply are the same as 
last frame, e.g. the subtitle renderer replied a series of bitmaps {A B C} for 
last frame and a series of bitmaps {C B D} for current frame, B/C in the first 
series and B/C in the second are granteed to point to the same objects, or a 
easy and quick equality comparison method between two bitmaps is provided, so 
that it won't be hard to detect difference between two replies. Then the reply 
option "same as last frame" in the interface won't be needed.

Original comment by YuZhuoHu...@gmail.com on 20 Nov 2011 at 2:32

GoogleCodeExporter commented 9 years ago

libass is actually pretty good these days, all it really needs is a dshow 
filter and a win32 font picking backend to avoid fontconfig, why don't you guys 
get crackin' on that instead

Original comment by kalle.bl...@gmail.com on 20 Nov 2011 at 5:44

GoogleCodeExporter commented 9 years ago

There is no reason why we couldn't have both. Finishing up work on xy-VSFilter 
should still be completed so it can replace VSFilter 2.39/2.40 as the de facto 
VSFilter build, but you do make a good point that any new 
subtitle_filter->video_renderer interface we create should be flexible enough 
to be adapted to libass or any other subtitle renderer which is created in the 
future.

That said, developer interest in porting libass to directshow has certainly 
increased over the past year. YuZhuoHuang (xy-VSFilter dev), Gommorah 
(threaded-VSFilter dev), Madshi (madVR dev), Nevcairiel (LAV-Filters dev), and 
Lachs0r (mplayer2 win32 builds) have all expressed interest porting libass at 
some point in time. If we get yourself, jfs, and the other Aegisub devs 
on-board, the massive undertaking creating a directshow-based libass may 
actually be feasible as a joint-effort. So far all the devs mentioned so far 
have only considered porting libass independently for their own purposes, so 
*someone* would need to put forth quite a bit of effort to gather everyone 
together and convince them to collaborate on a porting effort.

Original comment by cyber.sp...@gmail.com on 20 Nov 2011 at 6:28

GoogleCodeExporter commented 9 years ago

> Should the interface leave an opportunity for
> subtitle renderer to involve D3D?

I see no advantage in doing that. The video renderer is in the best position to 
decide if, when, how, in which format and in which thread to upload the 
subtitles to the GPU. E.g. imagine the video renderer uses OpenGL instead of 
D3D.

> We can do more (?). When some of the subtitle renderer's
> reply are the same as last frame, e.g. the subtitle
> renderer replied a series of bitmaps {A B C} for last
> frame and a series of bitmaps {C B D} for current frame,
> B/C in the first series and B/C in the second are granteed
> to point to the same objects, or a easy and quick equality
> comparison method between two bitmaps is provided, so that
> it won't be hard to detect difference between two replies.
> Then the reply option "same as last frame" in the interface
> won't be needed.

Hmmmm... Does it often happen in real life that some parts of the subtitles 
stay identical while others change, from one video frame to the next? If it 
does, then yes, we should allow the subtitle renderer to pass on this 
information somehow.

> libass is actually pretty good these days, all it really
> needs is a dshow filter and a win32 font picking backend
> to avoid fontconfig, why don't you guys get crackin' on
> that instead

AFAIK, libass supports ASS subtitles, nothing else. vsfilter supports pretty 
much every subtitle format out there (both text and bitmap based). Furthermore 
some ASS subtitles depend on the bugs in vsfilter to show perfectly. As a 
result my opinion is that we should ideally have both, vsfilter and libass. If 
YuZhuoHuang continues to improve xy-vsfilter, that should be a pretty good 
thing, IMHO. Getting xy-vsfilter improved and work with the new interface we're 
discussing should be much easier and quicker than developing a completely new 
subtitle renderer with support for all those funny subtitles formats out there.

In the long run, maybe vsfilter will be replaced by a completely new subtitle 
renderer. But that will take time, and it shouldn't stop us from improving 
xy-vsfilter in the meanwhile. And the interface we're dicussing should help for 
both.

@YuZhuoHuang, I've found this page:

http://code.google.com/p/libass/wiki/IssuesAndDifferences

It says: "\blur is scaled like border width if ScaledBorderAndShadow is on. 
VSFilter does not do any scaling. Likewise, the viewing distance for rotations 
is scaled. The goal is to get the same rendering result independent of 
rendering resolution."

So it seems having xy-vsfilter render to the upscaled target resolution may 
make problems, after all, with blurring and rotations. But maybe you can find a 
way to correct that, somehow?

Original comment by mad...@gmail.com on 20 Nov 2011 at 9:00

GoogleCodeExporter commented 9 years ago

"It says: "\blur is scaled like border width if ScaledBorderAndShadow is on. 
VSFilter does not do any scaling. Likewise, the viewing distance for rotations 
is scaled. The goal is to get the same rendering result independent of 
rendering resolution.""

That is something which should not be fixed globally, otherwise it would break 
scripts which depend on this VSFilter behavior like 
http://code.google.com/p/libass/issues/detail?id=6 with \blur. 

What this likely means is part of this new interface would involve making 
xy-VSFilter aware of the video resolution and aspect ratio in addition to the 
target resolution. First the Script_Resolution -> Video_Resolution blur & 
rotations would need to be calculated unscaled, and then from there 
Video_Resolution -> Target_Resolution for everything would need to be 
calculated as scaled (simulating resizing but rasterizing at higher resolution).

Does that sound feasible YuZhuoHuang? It may almost make sense to just have 
xy-VSFilter pass bitmaps with blur only and have the video renderer scale them 
its default interpolation to target resolution, then blending the scaled blur 
bitmaps with the subtitle bitmaps which are already passed at target 
resolution? Getting blur to look identical when rasterized at a higher 
resolution I suspect is tricky, and may force us to interpolate blurs? Thoughts?

Original comment by cyber.sp...@gmail.com on 20 Nov 2011 at 10:21

GoogleCodeExporter commented 9 years ago

> libass is actually pretty good these days, all it really
> needs is a dshow filter and a win32 font picking backend
> to avoid fontconfig, why don't you guys get crackin' on
> that instead

Creating a libass dshow filter that works like VSFilter requires quite a lot of 
work, and definitely the basic flaws, in both performance and quality, of the 
way VSFilter works will be inherited. We won't have a libass-filter works as 
good as it works in mplayer. But it would be much more easy to make a libass 
based subtitle render supporting the interface we're now dicussing.

> imagine the video renderer uses OpenGL instead of D3D.

Got it.

> Does it often happen in real life that some parts of the 
> subtitles stay identical while others change, from one 
> video frame to the next?

Very often. E.g. a moving text or a simple karaok effect.

> "\blur is scaled like border width if ScaledBorderAndShadow 
> is on. VSFilter does not do any scaling. Likewise, the 
> viewing distance for rotations is scaled. The goal is to get
> the same rendering result independent of rendering resolution."

I think even though there are some tags whose behavior (in VSFilter) depends on 
the resolution, subtitle render can use an extra resolution information for 
defining such tags, while outputing in another resolution. So the problem is 
solvable. This extra resolution information can be 
1.either the actual resolution of the video (setted by video renderer in the 
initial step), then the subtitle renderer can act exactly the same as VSFilter;
2.or the resolution the subtitle renderer read from the subtitle script, then 
output of the subtitle render is totallly independent to the video.

Original comment by YuZhuoHu...@gmail.com on 20 Nov 2011 at 12:09

GoogleCodeExporter commented 9 years ago

Here's an experimental dshow filter of libass: 
https://github.com/Arnavion/libassDShow

Original comment by astrat...@gmail.com on 20 Nov 2011 at 2:10

GoogleCodeExporter commented 9 years ago

taro, it is nice to see attempts to make DS filter out of libass, but I believe 
it is not clear enough how it is useful for current project. If you think that 
libassDShow also need to support the interface that is discussed here, then you 
probably should make that proposition to libassDShow author.

Original comment by yakits...@gmail.com on 20 Nov 2011 at 3:27

GoogleCodeExporter commented 9 years ago

> (a) subtitles should be transported as RGBA bitmaps, no D3D involved
> (b) subtitles should not be rendered/blended onto the video images by the 
subtitle renderer, that's the video renderer's job
> (c) the video renderer should ask the subtitle renderer for one big RGBA 
bitmap (or for a series of smaller RGBA bitmaps) for every video frame; the 
subtitle renderer should reply with one big (or a series of smaller) RGBA 
bitmap(s); the subtitle renderer can also reply with "same as last frame" to 
save resources/performance
> (d) the video renderer can request the subtitles to be rendered directly to 
the target resolution (after upscaling) to improve quality

Sounds like a plan to me (which i also would implement in LAV Video or wherever 
i end up rendering subs).

The interface seems easy enough to use, producing RGBA from both text and 
bitmap subs is usually an easy task. I'm not sure the "series of smaller 
bitmaps" is really needed, its not too much to ask the subtitle renderer to 
merge them into one RGBA image - but if you're going for the ultimate 
interface, sure why not.

Original comment by h.lepp...@gmail.com on 23 Nov 2011 at 9:42

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Nev, you see, "series of smaller bitmaps" is just how vsfilter works 
internally. It is not a problem to merge images, its just that this is slow 
approach.
Of course some other renderers may work differently or may be not so slow doing 
that but vsfilter support can't be just dropped.

Original comment by yakits...@gmail.com on 23 Nov 2011 at 10:00

GoogleCodeExporter commented 9 years ago

Libass also works like that, however I don't think the performance of that 
operation is that slow. Something needs to merge them to one big image and 
upload it to the gpu. Where that merge happens is not important, and could 
easily be done in the sub renderer itself.

Original comment by h.lepp...@gmail.com on 23 Nov 2011 at 10:25

GoogleCodeExporter commented 9 years ago

@astrataro
The project libassDShow is still far from usable. It hardly help with anyone. 
And definitely it will be better and easier to support the interface we are 
discussing.

@Nev
Fansubbers often place one subtitle on the top and one on the buttom. For such 
scripts, if the interface allows a series of bitmaps, passing only the dirty 
area will be possible. Some CPU->GPU communication can be saved (I guess) in 
comparison to passing a big bitmap that contains both the top subtitle and the 
buttom subtitle.

> (c) the video renderer should ask the subtitle renderer for one big RGBA 
bitmap (or for a series of smaller RGBA bitmaps) for every video frame; the 
subtitle renderer should reply with one big (or a series of smaller) RGBA 
bitmap(s); the subtitle renderer can also reply with "same as last frame" to 
save resources/performance

Add one rule: the video renderer can set a max limit on the number of smaller 
RGBA bitmaps return for one frame.(So the video render has a choise to force 
subtitle renderer to reply with only one big RGBA bitmaps.)

Another 2 problems:
1) If I want to pre-buffer for the future video frames, I'll need their 
presentation time. Probably I'll use a fix framerate, and caculate the 
timestamps of future video frames myself. It *works* even if the framerate 
differ from the actual video framerate. When the video renderer asking for a 
subpic, an algorithm similar to nearest neighbor can be employed to search the 
pre-buffered sub-picture queue and the sub-picture nearest to the one the video 
renderer asking for will be returned. But of course it'd better to use the same 
framerate as the actual video framerate. So the video should set a framerate to 
the subtitle renderer.

2) If the subtitle renderer works as a directshow filter, then how can it 
autoload when there are only external subtitles but no embedded subtitles?

Original comment by YuZhuoHu...@gmail.com on 24 Nov 2011 at 12:50

GoogleCodeExporter commented 9 years ago

Can you describe what purpose pre-buffer has exactly? Is it meant to make sure 
that the CPU is never idle? FWIW, madVR already pre-buffers many frames itself. 
Wouldn't that be good enough already? Or does it still make sense in your 
opinion to pre-buffer inside of xy-vsfilter, too?

No idea about 2). I don't know how autoloading works.

Original comment by mad...@gmail.com on 24 Nov 2011 at 1:44

GoogleCodeExporter commented 9 years ago

Ah, i didn't think of things like having two separate dirty regions.
I was more thinking about how libass works. It gives you long list of 1-bit 
alpha maps and 32-bit color, and its your job to combine these into a finished 
RGBA image anyway. I'm not against the whole "multiple small RGBA images" 
thing, i just couldn't think about a real use case.

At 1), that occured to me as well.
Pre-buffering can be useful, especially if your PC is not super fast. There is 
always times without speech, when you could render subtitles already, which are 
then all used in the next dialog.

2).. Thats an old problem. IMHO, it should be the players responsibility to 
detect external subs and load a subtitle renderer for them.

Original comment by h.lepp...@gmail.com on 24 Nov 2011 at 2:09

GoogleCodeExporter commented 9 years ago

The purpose would be to increase performance efficiency within VSFilter beyond 
just increasing CPU utilization. Remember that VSFilter is single-threaded, 
which means slowdowns bring everything to a halt, and when that happens the 
madVR queues have limited usefulness. Even stock VSFilter gets something like a 
10x-20x speed-up with pre-buffering enabled on my Core i5 compared to no 
pre-buffering. If xy-VSFilter could get even a fraction of that speed-up just 
by splitting some pre-buffering code and related operations into another 
thread(s), it would be rather amazing.

"Quote madshi:
IMHO you should change the VSFilter design so that you have one secondary 
thread (or multiple secondary threads) to do all the rendering work. These 
secondary render threads would store their rendering results in an internal 
rendering queue. Your main thread which is calling "CBaseOutputPin::Deliver()" 
would then do nothing but fetch already rendered frames from the internal queue 
and deliver them to the video renderer. The secondary thread(s) would then 
render as fast as they can, until the internal buffer queue is full. When the 
queue is full, your secondary render thread(s) should go to sleep. When 
"CBaseOutputPin::Deliver()" returns, your main thread can delete the delivered 
frame from the queue, and then wakeup the secondary thread to fill up the empty 
spot in the internal queue by rendering the next frame."

I don't believe YuZhuoHuang has yet decided what form a pre-buffer in 
xy-VSFilter will take, as it will likely be completely different than stock 
VSFilter. madshi, has your opinion (quoted above) about the optimal way to 
create an internal VSFilter pre-buffer, in order to remove bottlenecks when 
interacting with madVR's decoder buffer, changed in any way since then?

Original comment by cyber.sp...@gmail.com on 24 Nov 2011 at 2:30

GoogleCodeExporter commented 9 years ago

The main problem with pre-buffering remains - it can only function if you know 
what frames will be coming next.
With CFR material, thats trivial - just need to be told the frame rate of the 
movie, and you can compute the frames ahead. VFR content is a whole different 
problem. Sadly those anime folks are also those that sometimes use VFR material.

The good old worker thread design that madshi outlines above is still good, 
cannot go wrong with it, really. But you do need to know what to actually 
pre-render.

Original comment by h.lepp...@gmail.com on 24 Nov 2011 at 2:35

GoogleCodeExporter commented 9 years ago

> Can you describe what purpose pre-buffer has exactly? 
> Is it meant to make sure that the CPU is never idle? 
> FWIW, madVR already pre-buffers many frames itself. 
> Wouldn't that be good enough already? Or does it still 
> make sense in your opinion to pre-buffer inside of 
> xy-vsfilter, too?

I know madVR has a buffer queue and a subtitle renderer for madVR would not 
need to pre-buffer indeed. But for a common video renderer, it may not have 
such machnism. The main purpose for pre-buffering (if I am to do that) is to 
prevent sudden heavy script from causing *lag*, not to make to full usage of 
CPU. I'd prefer to reduce CPU usage other than increasing it to make subtitle 
render faster. After thinking on your question, I got a feeling that my purpose 
can be fulfilled in another way, instead of pre-buffering the subpic. Anyway, 
maybe someone else interesting in implementing a subtitle renderer that 
supports this interface would like to pre-buffer?

> No idea about 2). I don't know how autoloading works.

For VSFilter, since it has a video input pin and a output pin, it can always 
connect to the filter graph, and check if there is any subtitle 
(internal/external) to decide should it autoload. Now with the video input pin 
and output pin removed, when there is no interal subtitles, the splitter would 
not have any subtitle output pin, and the subtitle renderer cannot connect to 
the graph.

Original comment by YuZhuoHu...@gmail.com on 24 Nov 2011 at 2:40

GoogleCodeExporter commented 9 years ago

Why not just leave an Input & Output Pin to pass-through decoded video to the 
Video Renderer untouched (while subtitles would still use this new interface 
via callback)? Wouldn't that resolve the pre-buffering, auto-loading, VFR, and 
various other problems?

Original comment by cyber.sp...@gmail.com on 24 Nov 2011 at 2:54

GoogleCodeExporter commented 9 years ago

> Why not just leave an Input & Output Pin to pass-through decoded video 

Will it break DXVA?

Original comment by YuZhuoHu...@gmail.com on 24 Nov 2011 at 3:29

GoogleCodeExporter commented 9 years ago

Nothing can sit between the decoder and the renderer in DXVA.

Its sadly also not that trivial to just have video pass-through, because the 
renderer dictates how the video frame should be setup (stride), so either you 
need to support adjusting the image stride, or somehow forward such 
requirements to the decoder itself. Both options are sadly not trivial.

Original comment by h.lepp...@gmail.com on 24 Nov 2011 at 3:39

GoogleCodeExporter commented 9 years ago

Then how about combining a portion of the dummy pin method with the callback 
method?

Add a subtitle input pin to the Video Renderer and have it handle autoloading?
+
Add a dummy video output pin from the Video Renderer to a subtitle renderer, to 
inform it of timestamp information making pre-buffering and VFR possible?
+
This my also be a good time to make use of the extendable Open Media Format 
http://sourceforge.net/projects/openmediaformat/ to pass static information to 
the subtitle filter via the dummy output pin?

Though, this method would only make sense if the frame-rate/timestamp and 
auto-loading issues couldn't be more easily solved by other means.

Original comment by cyber.sp...@gmail.com on 24 Nov 2011 at 10:47

Changed title: Custom interface for rendering subtitles on an RGBA texture(s) | (VSFilterForMadVR)

GoogleCodeExporter commented 9 years ago

> Nothing can sit between the decoder and the renderer in DXVA.
> 
> Its sadly also not that trivial to just have video pass-through, 
> because the renderer dictates how the video frame should be setup 
> (stride), so either you need to support adjusting the image stride,
>  or somehow forward such requirements to the decoder itself. Both 
> options are sadly not trivial.

Got it.

> 2).. Thats an old problem. IMHO, it should be the players 
> responsibility to detect external subs and load a subtitle renderer 
> for them.

Hmmmm... I agree that someone else, I don't mind if it is a player or MadVR 
though, should load the subtitle renderer first. But better leave the detection 
work to subtitle renderer since it knows what it can deal with?

Original comment by YuZhuoHu...@gmail.com on 25 Nov 2011 at 12:08

GoogleCodeExporter commented 9 years ago

For compatibility sake, I think it would be best to avoid offloading any 
responsibility of supporting the new interface to player, if in any way 
possible.

If madVR adds support for this new interface, it should just work in any player 
which supports madVR without any additional coding hurdles.

Original comment by cyber.sp...@gmail.com on 25 Nov 2011 at 2:30

GoogleCodeExporter commented 9 years ago

(1) Pre-buffering:

As Hendrik said, the main problem is that if you want to pre-buffer, you need 
to know which start/stop times future video frames will have. What happens if 
xy-vsfilter prebuffers for a specific future frame start/stop time and then the 
video renderer unexpectedly asks for subtitles for a video frame that is right 
between pre-buffered start/stop times? I'm not sure how this could be handled.

Anyway, is there any need (or any use) for adding explicit pre-buffering 
support to the interface? As far as I can see, if xy-vsfilter prebuffers 
internally, the video renderer doesn't even have to know that prebuffering is 
used, or does it? Of course we could add some prebuffer control to the 
interface, but I'm not sure what purpose that would have exactly? I mean what 
could the video renderer do? Maybe it could turn prebuffering on/off, but 
that's all that comes to my mind.

Any more thoughts on prebuffering, anyone?

(2) auto-loading

Maybe this could be a task performed by LAV Splitter? I mean, if LAV Splitter 
detects external subtitles, it could just load them and behave as if they were 
part of the video file, too. Same with external audio tracks, btw. I've been 
wishing for a splitter which can auto load external audio and subtitle tracks 
for a long time. Ideally I'd like to store all my audio/video tracks demuxed 
and let the splitter pick them up automatically. But well, that's a different 
topic and Hendrik and I had a short discussion about this some months ago, IIRC.

Please don't require madVR to have a dummy output pin. Instead madVR could just 
load and add xy-vsfilter to the graph manually. That would be *MUCH* easier and 
should have the same effect. But still, I'd find it nicer to have the splitter 
do the work of making external audio/subtitle tracks available.

BTW, is there any way to get email notifications if someone adds a comment 
here? I've searched but didn't find anything.

Original comment by mad...@gmail.com on 26 Nov 2011 at 11:17

GoogleCodeExporter commented 9 years ago

> BTW, is there any way to get email notifications if someone adds a comment 
here? I've searched but didn't find anything.

"Star" the issue (click on the Star next to its name on the top)

Re: Auto-loading
It is a possibility that the splitter trys to detect external subs, however its 
still a functionali difference to how it works now, which you seemed to want to 
avoid in general? (ie. it wouldn't work with any other splitter)

Original comment by h.lepp...@gmail.com on 26 Nov 2011 at 11:22

GoogleCodeExporter commented 9 years ago

Auto-loading: I see 2 viable approaches:

(1) The new interface we're discussing makes sense only if the video renderer 
supports it. So it would be no problem to require every video renderer which 
supports the new interface to manually load xy-vsfilter. Ok, where to find the 
xy-vsfilter dll file? There are various ways how we could solve that. E.g. we 
could define a registry key which lists the user's choice for the auto-loaded 
subtitle renderer. Or alternatively the video renderer could just remember the 
subtitle renderer which was used "last time". And if no subtitle renderer is 
found in a graph, the video renderer could then manually load the "last time" 
subtitle renderer.

(2) LAV Splitter supporting external subtitle tracks. Yeah, you're right, this 
would obviously not work when using other splitters. It would be good enough 
for my needs, though, so I'd be fine with it.

Maybe (1) is better? I still generally do like (2), though.

---------

I've created a first detailed interface suggestion and uploaded it here:

http://madshi.net/SubRenderIntf.h

Comments very welcome. I'm not sure about a couple of things. Here are some 
comments:

(a) I wasn't sure who should initiate the connection (sub renderer or video 
renderer). I decided on the video renderer because the video renderer is 
usually the last filter in the chain which gets a pin connection. Only at that 
point in time the video renderer has enough information to establish the 
connection.

(b) I've used strings for the "option" parameter in all the ISubRenderOptions 
methods. I know it's a matter of taste. If you guys prefer a DWORD enum or a 
GUID instead that'd be fine with me, too.

(c) I'm not sure about interfaces and reference counts. Please double check the 
comments I've added to "ISubRenderProvider.Connect" and 
"ISubRenderServices.RenderFrame". Does it make sense to you this way? Or should 
we do the reference counting differently? I'm not really sure...

(d) I've used an extra interface for every rendered subtitle frame. Not sure if 
that makes sense. Maybe it makes things just more complicated than necessary. 
We could also return the bitmaps as a simple array via 
"ISubRenderServices.RenderFrame", if you prefer it that way. Thoughts?

Just a first suggestion. Any comments / change suggestions welcome!

Original comment by mad...@gmail.com on 26 Nov 2011 at 1:16

GoogleCodeExporter commented 9 years ago

> What happens if xy-vsfilter prebuffers for a specific future 
> frame start/stop time and then the video renderer unexpectedly 
> asks for subtitles for a video frame that is right between 
> pre-buffered start/stop times? 

If subpics for start/stop times [t0,t2) and [t2,t4) have been prebuffered, but 
subpic [t1,t3) is asked for, the subpic whose start/stop period includes 
(t1+t3)/2 will be returned, e.g. if t2 <= (t1+t3)/2 < t4, subpic for [t2,t4) 
will be returned. So if the subtitle render works in a framerate (very) 
different from actual playback, animated effects, e.g. moving/rotation/fading 
in/fading out, won't be smooth. 

> the video renderer doesn't even have to know that prebuffering 
> is used, or does it?

No, it doesn't. I don't think the video renderer needs to worry about it.

Auto-loading:

> (1) The new interface we're discussing makes sense only if the video 
> renderer supports it. So it would be no problem to require every 
> video renderer which supports the new interface to manually load 
> xy-vsfilter.

> (2) LAV Splitter supporting external subtitle tracks.

I think (1) is better, for compatibility.

Detailed interface:
----------
> who should initiate the connection

I vote the video renderer.
----------
In interface ISubRenderOptions:
I want video file name to search for corresponding external subtitles. Should I 
get it from ISubRenderOptions?
----------
In ISubRenderFrame:

> // The ID can stay identical if only the position (x, y) changes.
> ...
> STDMETHOD(GetBitmap)(int index, ULONGLONG *id, RECT *placement, LPVOID 
*pixels, int *pitch);

1) I'm not sure if random access is necessary, but I can go with it.
2) The ID can stay identical if only the *placement* changes?
3) And the callee should guarantee not to modify pixels, if using
    LPCVOID *pixels
instead of
    LPVOID *pixels
makes sense?
----------
In ISubRenderFrame:
> STDMETHOD(GetCombinedBitmap)(RECT *placement, LPVOID* pixels, int *pitch);

I'd prefer a 
    STDMETHOD(SetMaxBitmapCountPerFrame)(DWORD count) = 0;
in ISubRenderServices. Consumers which don't want a series (or a long series) 
of smaller bitmaps can set the upper limit to what them want. And this setting 
can be expected that remains unchanged for a long time once it is set. Knowing 
the setting before RenderFrame calls may help me decide what to cache or 
prebuffer.
----------

Original comment by YuZhuoHu...@gmail.com on 27 Nov 2011 at 7:16

GoogleCodeExporter commented 9 years ago

> who should initiate the connection

I vote the subtitle renderer. Its easier for the sub renderer to support 
multiple interfaces (ie. madVRs new interface, EVRs old interface, or falling 
back to drawing onto the plain image) if it doesn't have to "wait" if a 
renderer offers an interface. Instead i can be sure if there is an interface, 
or not.

-------------------

What really bugs me about the interface is the way memory management is done 
for the options.
Handing out memory pointers with a rule that they should stay valid is rather 
obscure, imho. Instead, i would follow the MS interfaces, and just let the 
callee allocate the data using a pre-determined function (ie. CoTaskMemAlloc), 
and make the caller responsible for freeing with the matching function (ie. 
CoTaskMemFree). At least for the options i would prefer it this way.

For the ISubRenderFrame, i guess its ok'ish to hand out fixed pointers, because 
its a object which actually holds the subtitle data, and those functions are 
just "getters" to expose the internal data - you would have freshly allocated 
them already anyway. I would however adjust the comment, and instead say 
something like this:

// The memory pointed to by the "pixels" variable is only valid
// until the next call of GetBitmap, GetCombinedBitmap, or Release

Otherwise, i guess its ok

Original comment by h.lepp...@gmail.com on 27 Nov 2011 at 7:35

GoogleCodeExporter commented 9 years ago

Fix typo:
> 3) And the callee should guarantee not to modify pixels, if using
Should be
"3) And the caller should guarantee not to modify pixels, if using"

Original comment by YuZhuoHu...@gmail.com on 27 Nov 2011 at 8:14