byroot / pysrt

Python parser for SubRip (srt) files
GNU General Public License v3.0
446 stars 67 forks source link

Add parent, previous and next properties to SubRipItem #52

Closed felagund closed 10 years ago

felagund commented 10 years ago

This works for me so far. However, it is for sure incomplete:

1) No testcases. 2) When SubRipItem is inserted into another SubRipFile or is removed from its original, the parent property does not change.

I also noticed that SubRipFile accepts input as a list, so one can add 1 or "bla" to it even though it does not make much sense. Maybe it would be worthwhile to override insert() append() and extend() (and remove()) methods so that they would update the parent property of added elements and would check if we are operating with SubRipItem instances and raise an error if not? What do you think?

byroot commented 10 years ago

To be honest I voluntarily avoided this feature because:

So unless you have a very good use case for that, I'll close this PR.

felagund commented 10 years ago

Well, my use case is easier code to write. Basically, I search for subtitles of certain properties (like too high cps) and want to know the adjacent subtitles for example. And I do this in over a hundred files, so my workflow is often creating a custom list os subtitles that I am interested in and then working with that list. Then it is impossible to know subtitles's neighbours. I could use list[list.index(item)+1] all the time and put some dictionarries into my custom lists, but this just makes it easier. But I am doing research on the subtitles so I admit that is not a very typical use case.

For a more typical usecase - there should be a minimal gap between subtitles, for ensuring that, it would be handy to know which subtitle is preceeding the current subtitle. But this requires that the information would be always up to date.

byroot commented 10 years ago

I see.

I'd recommend you to simply do it in your loop:


for sub, index in enumerate(sub_file):
  previous_sub = sub_file[index-1]
  next_sub = sub_file[index-1]

With of course bound checking so you don't trigger IndexError.

WOuld that be good enough for you?

felagund commented 10 years ago

Well, I will definitely be using the version that I requsted a pull for as I never alter the subtitles, just examine them, so I do not need to worry about anything.

I understand if you do not want to include it but I think it makes life easier for the users of the library, at least it does for me.

byroot commented 10 years ago

I'll think a bit more about it. But I'd really like to avoid SubRidItem to have a reference over a SubRipFile, and it would probably be a big backward compatibility change.

felagund commented 10 years ago

Ok:-). I consider the parent property to be only a means to an end. While useful, I implemented it mainly because previous and next would not work without it.

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.29%) when pulling 937919b7e91a1b028914e84eb7f9804035034ab7 on felagund:master into 99b3066d92de65f0ff1047932078b3232b0d49cb on byroot:master.

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.29%) when pulling 937919b7e91a1b028914e84eb7f9804035034ab7 on felagund:master into 99b3066d92de65f0ff1047932078b3232b0d49cb on byroot:master.

felagund commented 10 years ago
i = self.index

could be used instead of

index = self.parent.index(self) - 1

And would be noticeably faster (minutes versus several seconds when crawling over 100 000 individual subtitles (not subtitle files)).

However, it relies on index always being right (which is related to https://github.com/byroot/pysrt/issues/53 ).

byroot commented 10 years ago

The more I think about it, the more I think for use cases like those you should simply subclass SubRipFile so you can override the constructor to set previous_sub and next_sub properties.

Since IIRC you are doing read only operations, you'll never have to update theses references.

Alternatively I can provide an alternative iteration method that would yield previous and next sub so you could do:

for (previous_sub, current_sub, next_sub) in subtitles.walk():
   ...

What do you think?

felagund commented 10 years ago

Hm, the iteration method does not seem to be so useful. When I iterate over subtitles, I can use index to access the neighbour using enumerate(). The use case is more for working with individual subtitles picked out by some property and then it is useful to know its neighbours and parent. I admit that this is a convenience property, but that is what libraries are for:-). However, I can live without this:-).

byroot commented 10 years ago

Ok. Really sorry, ultimately I would like to provide that functionality, I just don't know how to implement it correctly.

felagund commented 10 years ago

Hm, it does sound impossible. One would just need to subclass append(), remove(), extend(), insert() and pop() methods of SubRipFile making sure its children have correctly set its parent property. Then one would probably provide two properties for each next() and previous() (so there would be 4 altogether) for SubRipItem: one would be slow but safe (probably calling clean_indexes on a deepcopy of its parent) and one fast but not safe (assuming its parent is in correct order).

However, this could work only if one disallowed one SubRipItem instance belonging to two or more SubRipFile instances (so slice() would need to use deepcopy() and append(), extend() and insert() would test if SubRipItem has a parent and if it did would use deepcopy() as well). But I am not sure it is a good idea to allow this in the first place. But I may be well missing something.

byroot commented 10 years ago

But I am not sure it is a good idea to allow this in the first place

Probably not, but I don't think it's worth it to break backward compatibility just for that...

felagund commented 10 years ago

I understand the importance of compatibility, but is there anyone who actually relies on that functionality (here I am at a loss about use case)?