byroot / pysrt

Python parser for SubRip (srt) files
GNU General Public License v3.0
446 stars 67 forks source link

Phantom pointers when assigning fields #65

Open meder411 opened 7 years ago

meder411 commented 7 years ago

I am trying to split subtiltes where there are two speakers that show up in the same frame. In my case this is indicated by newlines and hyphens ('\n-'). I this code snippet to split the subtitles into multiple:

# Split any multi-speaker subtitles (denoted by '\n-') into multiple single-speaker subtitles
for i in reversed(xrange(len(subs))):
    if '\n-' in subs[i].text:
        # Split the subtitle at the hyphen and format the list
        lines = [line[1:] if line[0] == '-' else line for line in subs[i].text.split('\n-')]
        length_milli = 1000 * float(subs[i].end.seconds - subs[i].start.seconds) + float(subs[i].end.milliseconds - subs[i].start.milliseconds)
        interval_milli = int(length_milli / len(lines))
        dummy = pysrt.SubRipItem(0, start=subs[i].start, end=subs[i].end, text="") # Use this just to get the right formatting for the time
        dummy.shift(milliseconds =+ interval_milli) # Shift the dummy so its start time is now the end time we want
        for j in xrange(len(lines)):
            new_sub = pysrt.SubRipItem(0, start=subs[i].start, end=dummy.start, text=lines[j])
            new_sub.shift(milliseconds =+ (j * interval_milli))
            subs.append(new_sub)
        del subs[i]
subs.clean_indexes()

The basic gist is that to format the time I am using a dummy object so that I can take advantage of shifting. For example, a 3-phrase frame over 3 seconds is split 3 ways would be 1 second long for each new frame.

When I create the dummy as above using start=sub.start and end=sub.end and then shift the dummy, it also shifts the original subtitle. I suspect this was not the intended behavior.

I found that casting sub.start and sub.end to strings in the assignment (e.g. start=str(sub.start)) solved the issue. It appears that without the cast, however, I am actually assigning a reference or pointer of some kind rather than the value of the string.

byroot commented 7 years ago

Indeed. Your problem is that SubRipItem.start is a SubRipTime instance which is mutable.

That is why casting as string solves it, because it end up doing a copy of the instance.

To be honest I kinda regret making those mutable, but it's quite of a breaking change so I'm not sure if I should fix it.