bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.51k stars 102 forks source link

TMDB search result similarity check #335

Open rraymondgh opened 1 month ago

rraymondgh commented 1 month ago

Is your feature request related to a problem? Please describe

I have found the following with TMDB search

Describe the solution you'd like

Cleaning

Similarity

Use github.com/hbollon/go-edlib

Apply similarity targets to min, median and max of these measures.

This reduces false positives and false negatives from use of levenshtein distance of 5.

false negatives distance > 5

image

false positives distance < 5

image

interaction of measures

image

proposed solution

Change bitmagnet to have a configuration such that a proxy is trusted. Proxy has to have these built in similarity checks and only returns one result in array if it passes similarity checks outline above

type Config struct {
    Enabled         bool
    BaseUrl         string
    ApiKey          string
    RateLimit       time.Duration
    RateLimitBurst  int
    SimilarityCheck bool
}

func NewDefaultConfig() Config {
    return Config{
        Enabled:         true,
        BaseUrl:         "https://api.themoviedb.org/3",
        ApiKey:          defaultTmdbApiKey,
        RateLimit:       defaultRateLimit,
        RateLimitBurst:  defaultRateLimitBurst,
        SimilarityCheck: true,
    }
}

levenshteinCheck() is only applied if SimilarityCheck is true