divijbindlish / parse-torrent-name

Extract media information from a filename
MIT License
215 stars 60 forks source link

Question about the regex if you're willing. #12

Open poiboy9000 opened 7 years ago

poiboy9000 commented 7 years ago

Hey!

I realize this repo is pretty old and basically complete at this point. I did have a question about the regex. I'm a student looking through your code pretty much for some regex practice/understanding. I had a question about the episode field in patterns: ('episode', '(ex(?:[^0-9]|$))'), The first part is very straightforward, but I don't get the purpose of the non-capturing group. It seems to match any non digit character or the end of the line. What purpose does that serve? I've been wracking my brain for it. It seems like it must be there to exclude false positves on episode numbers but I'm failing at coming up with it.

Thank you for your time

ferret-guy commented 7 years ago

It's to ensure that we only match episode numbers that are two digits long, so for example in this string:

Fear.The.Walking.Dead.S03E04.1080p.WEB-DL.6CH.x265.HEVC-PSA

if we don't have the non-capturing group, then we would also match the x26 codec information. The {} Quantifier only limits the number of digits to be matched but does not guarantee that there will not be more digits following.

You can play around with it here.