cross-seed / cross-seed

Fully-automatic cross-seeding with Torznab
https://cross-seed.org
Apache License 2.0
807 stars 68 forks source link

Missing matches on announce where announce name is different #606

Closed skifavp closed 2 weeks ago

skifavp commented 6 months ago

Tracker A announces as: TV.Show.S01E01.Episode.Name.1080p.AMZN.WEB-DL.DDP5.1.H.264-NTb It will match with trackers via search on complete downloads or irc announce who use the same name announce method. However some sites do: Showname S01E01 1080p AMZN WEB-DL DD+ 5.1 H.264-NTb, it wont match, so far i tried different settings between true/false. The actual filename is of course the same on one site the .torrent name the way it supposed to be, but IRC announce name is not. Anyway to improve it?

zakkarry commented 6 months ago

There's a threshold and distance variable we use for reverse lookups, and depending on the differences this will either match or not. Given the nature of potential releases being named differently, if we decide to loosen the restrictions to allow MORE (not all) of releases like this to match, it also allows for releases with the same amount of changes to match causing erroneous snatching of torrents and potential mismatches.

It's a pretty difficult thing to narrow down to effective but not too loose. I have plans to look into tightening (not loosening) the reverse lookup matching, however it may be possible to integrate some sort of parsing logic to match in situations like you describe, where we look up group, season/ep, and title, and match accordingly.

It's something I'm personally aware of, but haven't really worked much on.

MaddyTP commented 5 months ago

What if a library similar to 'guessit' were used to parse/sterilize filename to improve matching?

https://guessit.readthedocs.io/en/latest/

GGBot uses guessit with excellent results when checking for duplicates. Obviously this is a python package, the method could probably be reverse engineered.

zakkarry commented 5 months ago

the problem is trackers essentially obfuscate the real torrent name. for essentially no reason.

this isn't an issue generally with searches, but reverse lookups from rss and announce.

ninboy commented 4 months ago

Some unsolicited advise: Maybe an option for fuzzy search that ignores special characters like `,.,+,&,-and_, so it always compares a "sanitized" name. That would increase also matches against trackers that replaceDD+5.1forDD5.1. Special cases could be the+which is sometimes replaced byP(DDP5.1), or the&which sometimes is replaced byand`

zakkarry commented 4 months ago

Our fuzzy matching is loose enough that the separators used are not an issue in almost any case that would occur regularly.

Generally what we see is groups removing episode titles or something else significant with RSS or IRC announcements.

ppkhoa commented 3 months ago

I have a few examples here to show that tracker announces does not contain filename, at all, and therefore, will not match:

Actual filename from open trackers/release group's RSS feed: [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv 1st tracker announce: Boukyaku Battery - 07 (2024) [SubsPlease] [WEBRip] [HD 1080p] 2nd tracker announce: [SubsPlease] Boukyaku Battery - 07 [Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)][Episode 7]

Although, cross-seed later matched those via RSS feed. I assume this is because cross-seed grabbed the torrent via the download link included in RSS feed and checked the filename/hash, can we do the same for announces? (torrent download link is already included in the payload anyway)

zakkarry commented 3 months ago

Snatching every torrent that is sent via announce is not really something we would want to do.

Snatches are prefaced with quite a bit of filtering and verification, because most trackers consider snatching torrent files without downloading/seeding to be not appreciated.

ppkhoa commented 3 months ago

Maybe have an option to run a search using the name from announces?

Relevant log entries to compare between announces and RSS match:

Announces:

2024-05-21 16:11:14 verbose: [server] POST /api/announce
2024-05-21 16:11:14 verbose: [server] Received announce from Tracker2: [SubsPlease] Oblivion Battery - 07  [2024][Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)]
2024-05-21 16:11:16 verbose: [decide] [SubsPlease] Boukyaku Battery - 02 (1080p) [E954FB4E].mkv - no match for Tracker2 torrent [SubsPlease] Oblivion Battery - 07  [2024][Web][MKV][h264][1080p][AAC 2.0][Softsubs (SubsPlease)] - its size does not match - (NaN bytes)  <---------- Does not match the correct name, wrong episode

Search results:

2024-05-21 16:19:11 info: [torznab] Searching 10 indexers for [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv
[a bunch of RSS feeds URLs]
2024-05-21 16:20:08 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:08 verbose: [rtorrent] Calling method load.start with params [ '',
  '/home/ppkhoa/watch/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv.tmp.1716308408900.torrent',
  'd.directory_base.set="/home/ppkhoa/cross-seed/xseed/Tracker2"',
  'd.custom1.set="cross-seed"',
  'd.custom.set=addtime,1716308409' ]
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - injected
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker2 by MATCH - exists
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 verbose: [rtorrent] Calling method load.start with params [ '',
  '/home/ppkhoa/watch/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv.tmp.1716308409651.torrent',
  'd.directory_base.set="/home/ppkhoa/cross-seed/xseed/Tracker1"',
  'd.custom1.set="cross-seed"',
  'd.custom.set=addtime,1716308410' ]
2024-05-21 16:20:09 verbose: [rtorrent] Calling method download_list with params []
2024-05-21 16:20:09 info: Found [SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv on Tracker1 by MATCH - injected
2024-05-21 16:20:09 info: [server] Found 5 torrents for {
  path: '/home/ppkhoa/files/[SubsPlease] Boukyaku Battery - 07 (1080p) [8DFEE2F1].mkv'
}

EDIT: Tracker2 announces 4 times in a row for the same torrent since anime has Japanese names, romanization of Japanese names, English names, and another with filename in torrent name

zakkarry commented 3 months ago

If you want to search, use the webhook instead of the announce endpoint.

ppkhoa commented 3 months ago

My point is, search could find the release, but announce/RSS never matched them.

What I'm doing is put a sleep 120 in the curl webhook script (that rtorrent calls when download is finished) to search with name/infoHash from rtorrent so other trackers have some time to have it available for searching. If they are slower than that, either I have to manually get the torrent, or run the search manually (search via webhook with name manually). Otherwise, the release will never get found and cross-seeded.

zakkarry commented 3 months ago

You can schedule searches to run however often.

As I said, snatching every torrent given to the announce endpoint is not going to happen.

https://www.cross-seed.org/docs/basics/options#searchcadence

zakkarry commented 3 months ago

Furthermore if your torrents don't match the torrent name or file name, then your tracker is changing them, and this issue should be discussed with the tracker.

ppkhoa commented 3 months ago

I believe I figured this one out, at least with autobrr. I came across this issue over autobrr repo: https://github.com/autobrr/autobrr/issues/1197, and according to the response there, you can set Max size (I set mine to 50GiB as most media falls into that range) for the filter and autobrr will use trackers' API to get the file size (even though IRC announce does not contain filesize info). If your cross-seed config has matchMode set to risky, it will check the filesize and check/match accordingly, even though the torrent name is different.

I have been testing it with AB and AnT for anime content (these 2 sites have torrent name almost totally different from public trackers as they have their own naming scheme), working pretty well so far.

You will need to adjust the webhook payload from autobrr to cross-seed /api/announce a little to include size information. Just add size from the example in autobrr documentation, i.e.:

{
  "name": "{{ .TorrentName }}",
  "guid": "{{ .TorrentUrl }}",
  "link": "{{ .TorrentUrl }}",
  "size": "{{.Size}}",
  "tracker": "{{ .IndexerName | js}}"
}
ShanaryS commented 2 weeks ago

Considering this solved with #725

The parsing is much more robust and won't match incorrect episodes or seasons.