JohnDoee / autotorrent

Matches torrents with files and gets them seeded
MIT License
269 stars 34 forks source link

Request to add new mode, or perhaps modify "Normal" mode #41

Open OnAnOpenField opened 4 years ago

OnAnOpenField commented 4 years ago

I was reading the working modes on the main readme and realized that there is a particular use case that would be very useful in being added. Some trackers, for some reason, have slightly altered root folder and/or filenames, but will otherwise hash check as an exact match.

The strings are only a few characters off, for example tracker A's torrent will download as 'hello.there.2005.1080p.WEB-DL.x264.AAC-BoldOne.mkv' and tracker B's will download as 'hello.there.2005.1080p.WEBDL.x264.AAC-BoldOne.mkv'

The dash is missing from WEB-DL in tracker B's download. I understand the hash_size working mode could be used against this, but seeing as these strings only have a small hamming distance of differences, it could be worked into the Normal mode in order to avoid hash checking the files.

Thank you for reading.

JohnDoee commented 4 years ago

The original reason for not using edit distance was that "Episode 2" is 1 edit from "Episode 3" - if it all came down to space characters, then we might just strip them.

The current algorithm is https://github.com/JohnDoee/autotorrent/blob/develop/autotorrent/db.py#L284-L288 - your suggestion might as well just be stripping all spacers and turn them into

hello.there.2005.1080p.WEB-DL.x264.AAC-BoldOne.mkv -> hellothere20051080pwebdlx264aacboldone.mkv hello.there.2005.1080p.WEBDL.x264.AAC-BoldOne.mkv -> hellothere20051080pwebdlx264aacboldone.mkv