ebu / ebu-tt-live-toolkit

Toolkit for supporting the EBU-TT Live specification
http://ebu.github.io/ebu-tt-live-toolkit/
BSD 3-Clause "New" or "Revised" License
25 stars 10 forks source link

[EBUTTDEncoder] Duplicate styles and regions being introduced #466

Open nigelmegitt opened 6 years ago

nigelmegitt commented 6 years ago

The deduplicator removes references to duplicate styles and regions in all documents, but doesn't actually remove the duplicate style and region elements themselves.

The output of ebu_tt_live/examples/config/sproducer_resequencer_deduplicator_direct_ebuttd_encoder_fs.json (b0be0e5) produces this effect - the first document only has one style and one region element but all the subsequent ones have three of each, the unused ones of which aren't removed from the styling and layout elements.

nigelmegitt commented 6 years ago

Seen this again today. Definitely needs some attention.

nigelmegitt commented 6 years ago

On closer inspection, unreferenced styles and regions from previous documents are sometimes being inserted into the current document`s styling and layout elements. The references back to styles and regions are all apparently being updated to the correct styles and regions. The behaviour does not seem to be consistent.

nigelmegitt commented 6 years ago

I think I've ruled this out from being a deduplicator problem - I generated some test content just using the filesystem, so running each component separately, and the deduplicator worked just fine. This is therefore probably some interaction with the direct carriage mechanism and an object being manipulated later by a downstream (or upstream) node that hasn't "let go" cleanly. I wonder if it could even be a thread safety issue, but I haven't seen any mutexes anywhere, so I don't know if Python needs them...

nigelmegitt commented 6 years ago

A bit of investigation suggests this could be something to do with the EBUTT3Splicer, which appears to merge things into objects that are bound elements directly - need to check that all the objects get copied in to a new thing and can not be modified in-place, since an accidental left-over tt element somewhere could find itself being merged with other documents even if they have apparently been emitted (but on the direct carriage mech, so they're really still around).

nigelmegitt commented 6 years ago

OK, have been looking at this more, and I think I was wrong again. Making the deduplicator print its input and output documents shows it is working correctly. Chaining with direct carriage mechanism to an ebuttd-encoder and printing the input documents there shows that it is receiving the correct documents. Somehow though the output documents it is writing have the duplicate styles added back in! Maddening... I think I've ruled the direct carriage mechanism out though.

nigelmegitt commented 6 years ago

Sigh, this is really not a happy situation. It's definitely an effect that happens only when using direct carriage mechanism, but it is not at all obvious what the cause is.

If I create a config script that goes

(ws)->Delay->Resequence->Deduplicate->(fs) and another that goes (fs)->EBUTTDEncode->(fs)

then it all works fine. So that's at least a workaround. (you have to tell the second script what the media begin time is otherwise you can end up with negative times and errors)

Should test to see if the same effect is observed using Websocket to connect the Deduplicator to the TTD Encoder - I'm guessing it won't be.

nigelmegitt commented 6 years ago

Confirmed - the unwanted behaviour is avoided (worked around) by using a config that uses websocket to make the last hop to the EBU-TT-D Encoder:

(ws)->Delay->Resequence->Deduplicate->(ws)->EBUTTDEncode->(fs)

works fine.