LizardByte / Themerr-plex

Plugin for Plex Media Server that adds theme songs to movies using ThemerrDB.
https://app.lizardbyte.dev/ThemerrDB
GNU Affero General Public License v3.0
71 stars 4 forks source link

Chinese collections cannot fetch theme songs. #389

Closed x1ao4 closed 7 months ago

x1ao4 commented 8 months ago

Describe the Bug

When using the original version of tmdb_helper.py, Chinese titled collections cannot retrieve corresponding TMDB data. Here's an example of the log:

2024-03-12 02:09:49,141 (70000a6bf000) :  DEBUG (plex_api_helper:454) - Getting database info for item: 黑夜传说(系列)
2024-03-12 02:09:49,145 (70000a6bf000) :  DEBUG (networking:139) - Fetching 'http://127.0.0.1:32400/services/tmdb?uri=/search/collection?query=%5Cu9ed1%5Cu591c%5Cu4f20%5Cu8bf4%5Cuff08%5Cu7cfb%5Cu5217%5Cuff09' from the HTTP cache
2024-03-12 02:09:49,150 (70000a6bf000) :  DEBUG (tmdb_helper:117) - TMDB data: {'total_results': 0, 'total_pages': 1, 'page': 1, 'results': []}
2024-03-12 02:09:49,150 (70000a6bf000) :  DEBUG (plex_api_helper:562) - Database info for item: 黑夜传说(系列), database_info: ('movie_collections', 'themoviedb', 'tv.plex.agents.movie', None)

After testing, I found that the issue lies with URL encoding. Since collections are searched and matched on TMDB using their titles, the original script uses the String.Quote function which utilizes Unicode escape sequence URL encoding. For example, "黑夜传说(系列)" would be encoded as:

%5Cu9ed1%5Cu591c%5Cu4f20%5Cu8bf4%5Cuff08%5Cu7cfb%5Cu5217%5Cuff09

I made some modifications to the script. Now, when the title is in Chinese, it uses the urllib.quote function with UTF-8 encoding for URL encoding. For titles in other languages, the script still uses the String.Quote function. For example, "黑夜传说(系列)" would now be encoded as:

%E9%BB%91%E5%A4%9C%E4%BC%A0%E8%AF%B4%EF%BC%88%E7%B3%BB%E5%88%97%EF%BC%89

The modified tmdb_helper.py is as follows:

# -*- coding: utf-8 -*-

# plex debugging
try:
    import plexhints  # noqa: F401
except ImportError:
    pass
else:  # the code is running outside of Plex
    from plexhints.constant_kit import CACHE_1DAY  # constant kit
    from plexhints.log_kit import Log  # log kit
    from plexhints.parse_kit import JSON  # parse kit
    from plexhints.util_kit import String  # util kit

# imports from Libraries\Shared
from typing import Optional, Union
import urllib
import urlparse

# url borrowed from TheMovieDB.bundle
tmdb_base_url = 'http://127.0.0.1:32400/services/tmdb?uri='

def get_tmdb_id_from_external_id(external_id, database, item_type):
    # type: (Union[int, str], str, str) -> Optional[int]
    """
    Convert IMDB ID to TMDB ID.

    Use the builtin Plex tmdb api service to search for a movie by IMDB ID.

    Parameters
    ----------
    external_id : Union[int, str]
        External ID to convert.
    database : str
        Database to search. Must be one of 'imdb' or 'tvdb'.
    item_type : str
        Item type to search. Must be one of 'movie' or 'tv'.

    Returns
    -------
    Optional[int]
        Return TMDB ID if found, otherwise None.

    Examples
    --------
    >>> get_tmdb_id_from_external_id(imdb_id='tt1254207', database='imdb', item_type='movie')
    10378
    >>> get_tmdb_id_from_external_id(imdb_id='268592', database='tvdb', item_type='tv')
    48866
    """
    if database.lower() not in ['imdb', 'tvdb']:
        Log.Exception('Invalid database: {}'.format(database))
        return
    if item_type.lower() not in ['movie', 'tv']:
        Log.Exception('Invalid item type: {}'.format(item_type))
        return

    # according to https://www.themoviedb.org/talk/5f6a0500688cd000351c1712 we can search by external id
    # https://api.themoviedb.org/3/find/tt0458290?api_key=###&external_source=imdb_id
    find_url_suffix = 'find/{}?external_source={}_id'

    url = '{}/{}'.format(
        tmdb_base_url,
        find_url_suffix.format(String.Quote(s=str(external_id), usePlus=True), database.lower())
    )
    try:
        tmdb_data = JSON.ObjectFromURL(
            url=url, sleep=2.0, headers=dict(Accept='application/json'), cacheTime=CACHE_1DAY, errors='strict')
    except Exception as e:
        Log.Debug('Error converting external ID to TMDB ID: {}'.format(e))
    else:
        Log.Debug('TMDB data: {}'.format(tmdb_data))
        try:
            # this is already an integer, but let's force it
            tmdb_id = int(tmdb_data['{}_results'.format(item_type.lower())][0]['id'])
        except (IndexError, KeyError, ValueError):
            Log.Debug('Error converting external ID to TMDB ID: {}'.format(tmdb_data))
        else:
            return tmdb_id

def get_tmdb_id_from_collection(search_query):
    # type: (str) -> Optional[int]
    """
    Search for a collection by name.

    Use the builtin Plex tmdb api service to search for a tmdb collection by name.

    Parameters
    ----------
    search_query : str
        Name of collection to search for.

    Returns
    -------
    Optional[int]
        Return collection ID if found, otherwise None.

    Examples
    --------
    >>> get_tmdb_id_from_collection(search_query='James Bond Collection')
    645
    >>> get_tmdb_id_from_collection(search_query='James Bond')
    645
    """
    # /search/collection?query=James%20Bond%20Collection&include_adult=false&language=en-US&page=1"
    query_url = 'search/collection?query={}'

    # Plex returns 500 error if spaces are in collection query, same with `_`, `+`, and `%20`... so use `-`
    if any(u'\u4e00' <= ch <= u'\u9fff' for ch in search_query):
        search_query = urllib.quote(search_query.encode('utf8'))
    else:
        search_query = String.Quote(s=search_query.replace(' ', '-'), usePlus=True)

    url = '{}/{}'.format(tmdb_base_url, query_url.format(search_query))
    try:
        tmdb_data = JSON.ObjectFromURL(
            url=url, sleep=2.0, headers=dict(Accept='application/json'), cacheTime=CACHE_1DAY, errors='strict')
    except Exception as e:
        Log.Debug('Error searching for collection {}: {}'.format(search_query, e))
    else:
        collection_id = None
        Log.Debug('TMDB data: {}'.format(tmdb_data))

        end_string = 'Collection'  # collection names on themoviedb end with 'Collection'
        try:
            for result in tmdb_data['results']:
                if result['name'].lower() == search_query.lower() or \
                        '{} {}'.format(search_query.lower(), end_string).lower() == result['name'].lower():
                    collection_id = int(result['id'])
        except (IndexError, KeyError, ValueError):
            Log.Debug('Error searching for collection {}: {}'.format(search_query, tmdb_data))
        else:
            return collection_id

After the modification, TMDB data for collections with Chinese titles can now be retrieved. For example:

2024-03-16 04:28:12,457 (7000034c9000) :  DEBUG (plex_api_helper:454) - Getting database info for item: 黑夜传说(系列)
2024-03-16 04:28:12,465 (7000034c9000) :  DEBUG (networking:139) - Fetching 'http://127.0.0.1:32400/services/tmdb?uri=/search/collection?query=%E9%BB%91%E5%A4%9C%E4%BC%A0%E8%AF%B4%EF%BC%88%E7%B3%BB%E5%88%97%EF%BC%89' from the HTTP cache
2024-03-16 04:28:12,474 (7000034c9000) :  DEBUG (tmdb_helper:123) - TMDB data: {'total_results': 1, 'total_pages': 1, 'page': 1, 'results': [{'poster_path': '/aK8qq0X0pZbf5ncE3JLQ27hdC4F.jpg', 'name': 'Underworld Collection', 'overview': u'A centuries-old war is waged between vampires and lycans (short for lycanthrope, from the Greek \'luk\' [wolf] + \'\xe1nthr\u014dpos\' [human], or werewolf). It begins with a fifth-century man, the sole survivor of a plague that wipes out his village. Somehow, his body turns the disease to his benefit, and afterward, he ceases to age. After living secretly, he moves often to prevent his immunity to the ravages of time from being discovered, even marrying on occasion. After he fathers twin sons who also inherit his gift, they learn its ultimate price after they become the first vampire and lycan after one suffers a bite from a bat and the other from a wolf. Once they learn they can "turn" normal humans into others like them by inflicting their bites on them, they terrorize the nearby countryside, taking from them either victims or tribute. The vampires ultimately enslave the Lycans until love leads to the Lycans\' escape, igniting a bitter war that seems destined to have no end.', 'original_name': 'Underworld Collection', 'backdrop_path': '/2gSaXagD9ZCuBHOsXF4tvtW7Djd.jpg', 'adult': False, 'id': 2326, 'original_language': 'en'}]}
2024-03-16 04:28:12,474 (7000034c9000) :  DEBUG (plex_api_helper:562) - Database info for item: 黑夜传说(系列), database_info: ('movie_collections', 'themoviedb', 'tv.plex.agents.movie', None)
2024-03-16 04:28:12,474 (7000034c9000) :  DEBUG (plex_api_helper:114) - item title: 黑夜传说(系列)

However, collections with Chinese titles still cannot fetch theme songs, while collections with English titles can. For example, I have one collection titled "James Bond Collection" and another titled "詹姆斯·邦德(系列)" in two separate libraries. The collection with the English title successfully fetched the theme song, but the one with the Chinese title did not.

2024-03-16 04:28:57,203 (700004cdb000) :  DEBUG (plex_api_helper:454) - Getting database info for item: James Bond Collection
2024-03-16 04:28:57,208 (700004cdb000) :  DEBUG (networking:139) - Fetching 'http://127.0.0.1:32400/services/tmdb?uri=/search/collection?query=James-Bond-Collection' from the HTTP cache
2024-03-16 04:28:57,213 (700004cdb000) :  DEBUG (tmdb_helper:123) - TMDB data: {'total_results': 1, 'total_pages': 1, 'page': 1, 'results': [{'poster_path': '/ofwSiqOFShhunAIYYdSMHMJQSx2.jpg', 'name': 'James Bond Collection', 'overview': 'The James Bond film series is a British series of spy films based on the fictional character of MI6 agent James Bond, codename "007". With all of the action, adventure, gadgetry & film scores that Bond is famous for.', 'original_name': 'James Bond Collection', 'backdrop_path': '/dOSECZImeyZldoq0ObieBE0lwie.jpg', 'adult': False, 'id': 645, 'original_language': 'en'}]}
2024-03-16 04:28:57,214 (700004cdb000) :  DEBUG (plex_api_helper:562) - Database info for item: James Bond Collection, database_info: ('movie_collections', 'themoviedb', 'tv.plex.agents.movie', None)

2024-03-16 04:28:31,560 (7000034c9000) :  DEBUG (plex_api_helper:454) - Getting database info for item: 詹姆斯·邦德(系列)
2024-03-16 04:28:31,571 (7000034c9000) :  DEBUG (networking:139) - Fetching 'http://127.0.0.1:32400/services/tmdb?uri=/search/collection?query=%E8%A9%B9%E5%A7%86%E6%96%AF%C2%B7%E9%82%A6%E5%BE%B7%EF%BC%88%E7%B3%BB%E5%88%97%EF%BC%89' from the HTTP cache
2024-03-16 04:28:31,575 (7000034c9000) :  DEBUG (tmdb_helper:123) - TMDB data: {'total_results': 1, 'total_pages': 1, 'page': 1, 'results': [{'poster_path': '/ofwSiqOFShhunAIYYdSMHMJQSx2.jpg', 'name': 'James Bond Collection', 'overview': 'The James Bond film series is a British series of spy films based on the fictional character of MI6 agent James Bond, codename "007". With all of the action, adventure, gadgetry & film scores that Bond is famous for.', 'original_name': 'James Bond Collection', 'backdrop_path': '/dOSECZImeyZldoq0ObieBE0lwie.jpg', 'adult': False, 'id': 645, 'original_language': 'en'}]}
2024-03-16 04:28:31,576 (7000034c9000) :  DEBUG (plex_api_helper:562) - Database info for item: 詹姆斯·邦德(系列), database_info: ('movie_collections', 'themoviedb', 'tv.plex.agents.movie', None)

I'm not sure if there are notifications in the logs for fetching theme songs for collections because I haven't seen any "data found for collection" notifications while monitoring the logs. It might be challenging to filter logs based on this. However, in the WebUI, I noticed that only collections in the English library fetched theme songs, while collections in the Chinese library did not. I hope we can find the reason for this discrepancy.

collections

Furthermore, even though we've retrieved data from TMDB, the collection IDs in the logs still show as None. Is this normal? Also, in the WebUI, it displays as "No known ID," despite some collections having successfully matched data from TMDB.

I'm puzzled as well. Both collections with Chinese and English titles have successfully matched with TMDB. So, it's unclear why only collections with English titles are fetching theme songs. There might be a specific issue or limitation in the retrieval process that's causing this discrepancy. It could be worth investigating further to understand the root cause.

How does Themerr search and match collections in ThemerrDB? Is the issue possibly occurring here?

Expected Behavior

No response

Additional Context

No response

ReenigneArcher commented 8 months ago

Regarding the modification to the quote, I think we'd need something more robust than that. It's unlikely that Chinese characters are the only ones with issues.

This is the function we're currently using. https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/utilkit.py#L229

If we're going to move to a non framework method, we should probably use urllib3 instead of urllib.

Regarding why it doesn't work after you adjust the quoting, the reason is the results are returning in English, but your query is in Chinese.

            for result in tmdb_data['results']:
                if result['name'].lower() == search_query.lower() or \
                        '{} {}'.format(search_query.lower(), end_string).lower() == result['name'].lower():
                    collection_id = int(result['id'])

Your result['name].lower() is james bond collection, but your search_query is 詹姆斯·邦德 (not sure what the lowercase version of that is, or if you even have lowercase letters in Chinese). So ultimately, the collection_id is not being set.

@zdimension didn't you confirm that collections are working for you in French? Is Plex returning the TMDB results in French for you in this case?

x1ao4 commented 8 months ago

we should probably use urllib3 instead of urllib

Does Plex's Python version support urllib3?

I tried adding a language parameter to the query_url, such as &language=zh-CN, but the TMDB data returned is still in English. Why is that? If we can set the search language, we should be able to retrieve the language setting from Plex's Library and then add it to the query_url. However, even after adding the language parameter, the data returned is still in English, which confuses me.

ReenigneArcher commented 8 months ago

Does Plex's Python version support urllib3?

We can add it as a dependency. It's already a sub dependency of requests, so it's technically already included in the plugin.

I tried adding a language parameter to the query_url

This would be nice. I don't know if the service Plex provides for the TMDB lookup supports a language query. As far as I know, it's completely undocumented.

I have a note in the code:

# /search/collection?query=James%20Bond%20Collection&include_adult=false&language=en-US&page=1"

This is from the official TMDB api. I don't know if using the plex service, passes through every parameter or if it's sanitized somehow. http://127.0.0.1:32400/services/tmdb?uri= is the base URL.

TMDB language reference: https://developer.themoviedb.org/docs/languages

x1ao4 commented 8 months ago

Plex staff mentioned that it's not possible to add a language option for query_url = 'search/collection?query={}', which means it's not possible to retrieve collection information in languages other than English through Plex built-in API. If that's the case, it seems that non-English users will have to use their own TMDB API to obtain collection information in their preferred language.

Otherwise, the only option is to use the ID of the collection information that ranks first in the returned results as the collection ID. This way, language issues can be ignored, without using precise matching, but it may lead to matching errors in certain cases.

ReenigneArcher commented 8 months ago

Unless I'm missing something, I think the service just passes through the entire query.

https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/PlexMovie.bundle/Contents/Code/__init__.py#L68

It would seem odd that Plex accepts language for this code, but parse it out for collections.

And with that, I got it to work.

You have to URL encode the & in the TMDB query, otherwise Plex receives that as part of it's query.

http://127.0.0.1:32400/services/tmdb?uri=/search/collection?query=%E8%A9%B9%E5%A7%86%E6%96%AF%C2%B7%E9%82%A6%E5%BE%B7%EF%BC%88%E7%B3%BB%E5%88%97%EF%BC%89%26language=zh

or locally in a browser

http://127.0.0.1:32400/services/tmdb?X-Plex-Token=<your_token>&uri=/search/collection?query=%E8%A9%B9%E5%A7%86%E6%96%AF%C2%B7%E9%82%A6%E5%BE%B7%EF%BC%88%E7%B3%BB%E5%88%97%EF%BC%89%26language=zh

should both work. I tested in my browser and get the following:

{
    "page": 1,
    "results": [
        {
            "adult": false,
            "backdrop_path": "/dOSECZImeyZldoq0ObieBE0lwie.jpg",
            "id": 645,
            "name": "詹姆斯·邦德(系列)",
            "original_language": "en",
            "original_name": "James Bond Collection",
            "overview": "007是风靡全球的一系列谍战电影,007不仅是影片的名称,更是主人公特工詹姆斯·邦德的代号。詹姆斯·邦德(英语:James Bond)是一套小说和系列电影的主角名称。小说原作者是英国作家伊恩·佛莱明。在故事里,邦德是英国情报机构军情六处的间谍,代号007,被授予可以除去任何妨碍行动的人的权力,此外,詹姆斯·邦德总是有美女相伴,那些女士被称为\"邦女郎\"。 他冷酷但多情,机智且勇敢,总能在最危难时化险为夷,也总能邂逅一段浪漫的爱情。历任007éƒ½æ˜¯å¤§å¸…å“¥ï¼Œå†åŠ ä¸Šæ€§æ„Ÿæ¼‚äº®çš„é‚¦å¥³éƒŽï¼Œä»¥åŠæ‰£äººå¿ƒå¼¦çš„ç²¾å½©å‰§æƒ…ï¼Œè®©è¿™éƒ¨å½±ç‰‡ç›´è‡³ä»Šå¤©ä»è¢«å¹¿å¤§å½±è¿·æ‰€çƒ­çˆ±ã€‚ 第一部007电影于1962å¹´10月5æ—¥å…¬æ˜ åŽï¼Œ007电影系列风靡全球,到今天历经五十余年长盛不衰。",
            "poster_path": "/oKDxj9E15x3DjSjl4TnSWVUaVSw.jpg"
        }
    ],
    "total_pages": 1,
    "total_results": 1
}

Finally, I'd probably try to avoid talking about this plugin on the Plex forum. For sure we're doing stuff that they won't appreciate, by hacking their TMDB service among other things. They aren't too friendly to third party plugins these days.

x1ao4 commented 8 months ago

You have to URL encode the & in the TMDB query, otherwise Plex receives that as part of it's query.

I changed the query_url in tmdb_helper.py to search/collection?query={}%26language=zh, and it indeed returned Chinese data. However, the Chinese characters appeared in Unicode encoding format. I suspect this format might still prevent the retrieval of IDs, and it may be necessary to convert the retrieved data to UTF-8 encoding to match non-English text.

2024-03-17 23:26:28,902 (700004eca000) :  DEBUG (tmdb_helper:123) - TMDB data: {'total_results': 1, 'total_pages': 1, 'page': 1, 'results': [{'poster_path': '/d83LVydlQonKdshwQyLYx48D3LH.jpg', 'name': u'\u7231\u5ba0\u5927\u673a\u5bc6\uff08\u7cfb\u5217\uff09', 'overview': u'\u8bb2\u8ff0\u4e86\u5728\u7ebd\u7ea6\u4e00\u5e62\u70ed\u95f9\u7684\u516c\u5bd3\u5927\u697c\u91cc\uff0c\u6709\u4e00\u7fa4\u5ba0\u7269\uff0c\u6bcf\u5929\u4e3b\u4eba\u51fa\u95e8\u540e\u3001\u56de\u5bb6\u524d\u8fd9\u91cc\u5c31\u53d8\u6210\u4e86\u5b83\u4eec\u7684\u4e50\u56ed\uff1a\u6709\u7684\u548c\u5176\u4ed6\u5ba0\u7269\u4e00\u8d77\u51fa\u53bb\u73a9\uff1b\u6709\u7684\u805a\u5728\u4e00\u8d77\u4ea4\u6d41\u4e3b\u4eba\u7684\u7cd7\u4e8b\uff1b\u8fd8\u6709\u7684\u5728\u4e0d\u505c\u636f\u996c\u81ea\u5df1\u7684\u5916\u8c8c\uff0c\u4f7f\u81ea\u5df1\u770b\u4e0a\u53bb\u66f4\u53ef\u7231\u4ee5\u4fbf\u4ece\u4e3b\u4eba\u90a3\u91cc\u8981\u6765\u66f4\u591a\u7684\u96f6\u98df\u2026\u2026\u603b\u4e4b\uff0c\u5ba0\u7269\u4eec\u6bcf\u5929\u7684\u201c\u671d\u4e5d\u665a\u4e94\u201d\u662f\u4ed6\u4eec\u4e00\u5929\u4e2d\u6700\u81ea\u7531\u3001\u6700\u60ec\u610f\u7684\u65f6\u5149\u3002  \u3000\u3000\u5728\u8fd9\u7fa4\u5ba0\u7269\u4e2d\uff0c\u6709\u4e00\u53ea\u5c0f\u730e\u72ac\u662f\u5f53\u4ec1\u4e0d\u8ba9\u7684\u9886\u8896\uff0c\u4ed6\u53eb\u9ea6\u514b\u65af\uff08Max\uff09\uff0c\u673a\u667a\u53ef\u7231\uff0c\u81ea\u8ba4\u4e3a\u662f\u5973\u4e3b\u4eba\u751f\u6d3b\u7684\u4e2d\u5fc3\u2014\u2014\u76f4\u5230\u5979\u4ece\u5916\u5e26\u56de\u5bb6\u4e00\u53ea\u61d2\u6563\u3001\u6ca1\u6709\u5bb6\u6559\u7684\u6742\u79cd\u72d7\u201c\u516c\u7235\u201d\uff08Duke\uff09\u3002  \u3000\u3000\u9ea6\u514b\u65af\u548c\u516c\u7235\u4eba\u751f\u89c2\u4ef7\u503c\u89c2\u90fd\u4e0d\u4e00\u6837\uff0c\u81ea\u7136\u5f88\u96be\u548c\u5e73\u5171\u5904\u3002\u4f46\u5f53\u5b83\u4eec\u4e00\u8d77\u6d41\u843d\u7ebd\u7ea6\u8857\u5934\u540e\uff0c\u4e24\u4eba\u53c8\u5fc5\u987b\u629b\u5f03\u5206\u6b67\u3001\u5171\u540c\u963b\u6b62\u4e00\u53ea\u88ab\u4e3b\u4eba\u629b\u5f03\u7684\u5ba0\u7269\u5154\u201c\u96ea\u7403\u201d\uff08Snowball\uff09\u2014\u2014\u540e\u8005\u4e3a\u4e86\u62a5\u590d\u4eba\u7c7b\uff0c\u51c6\u5907\u7ec4\u7ec7\u4e00\u652f\u906d\u5f03\u5ba0\u7269\u5927\u519b\u5728\u665a\u996d\u524d\u5411\u4eba\u7c7b\u53d1\u8d77\u603b\u653b\u2026\u2026', 'original_name': 'The Secret Life of Pets Collection', 'backdrop_path': '/fAibj0DIT8gk5jQtsEor66QKCsR.jpg', 'adult': False, 'id': 427084, 'original_language': 'en'}]}

I'd probably try to avoid talking about this plugin on the Plex forum

Understood, I got it.

ReenigneArcher commented 8 months ago

Could you try this build? https://github.com/LizardByte/Themerr-plex/actions/runs/8316419576?pr=395

So far, I didn't do anything special to handle the unicode... but I suspect the framework may handle that automatically.

x1ao4 commented 8 months ago

I tested it, and the returned data is still in Unicode encoding. It seems like I didn't retrieve the theme song for the collection. If it were successful, what message should appear in the log?

2024-03-18 00:19:10,080 (700010ffe000) :  DEBUG (tmdb_helper:117) - TMDB data: {'total_results': 1, 'total_pages': 1, 'page': 1, 'results': [{'poster_path': '/r6ujhctKtNVfxdj8DNs0gDdMkjN.jpg', 'name': u'\u8d85\u51e1\u8718\u86db\u4fa0\uff08\u7cfb\u5217\uff09', 'overview': u'\u300a\u8d85\u51e1\u8718\u86db\u4fa0\u300b\uff08\u7cfb\u5217\uff09\u6539\u7f16\u81ea\u6f2b\u5a01\u8d85\u7ea7\u82f1\u96c4\u6f2b\u753b\uff0c\u7531\u9a6c\u514b\xb7\u97e6\u5e03\u6267\u5bfc\uff0c\u5b89\u5fb7\u9c81\xb7\u52a0\u83f2\u5c14\u5fb7\uff0c\u827e\u739b\xb7\u65af\u901a\uff0c\u745e\u65af\xb7\u4f0a\u51e1\u65af\uff0c\u9a6c\u4e01\xb7\u8f9b\uff0c\u838e\u8389\xb7\u83f2\u5c14\u5fb7\u7b49\u4e3b\u6f14\u3002\u300a\u8d85\u51e1\u8718\u86db\u4fa0\u300b\uff08\u7cfb\u5217\uff09\u4e0d\u540c\u4e8e\u6b64\u524d\u5c71\u59c6\xb7\u96f7\u7c73\u6267\u5bfc\u7684\u300a\u8718\u86db\u4fa0\u300b\u4e09\u90e8\u66f2\uff0c\u6b64\u90e8\u5c06\u89c6\u89d2\u62c9\u56de\u5230\u5f7c\u5f97\xb7\u5e15\u514b\u7684\u9ad8\u4e2d\u65f6\u4ee3\uff0c\u5e74\u8f7b\u7684\u4ed6\u4e00\u65b9\u9762\u8981\u540c\u81ea\u5df1\u7684\u521d\u604b\u683c\u6e29\u5171\u540c\u7ecf\u5386\u7231\u60c5\u627f\u8bfa\u7684\u8003\u9a8c\uff0c\u53e6\u4e00\u65b9\u9762\u8fd8\u8981\u63ed\u5f00\u53cc\u4eb2\u795e\u79d8\u5931\u8e2a\u7684\u771f\u76f8\uff0c\u5728\u4eba\u751f\u6700\u5927\u7684\u6311\u6218\u4e2d\u5b8c\u6210\u4ece\u5e38\u4eba\u5230\u82f1\u96c4\u7684\u547d\u8fd0\u8f6c\u53d8\u3002', 'original_name': 'The Amazing Spider-Man Collection', 'backdrop_path': '/yFGBYtzbvSKKI5qSvyUBWeq1uiJ.jpg', 'adult': False, 'id': 125574, 'original_language': 'en'}]}
ReenigneArcher commented 8 months ago

Okay, I think we just need to modify this part of the code now.

                if result['name'].lower() == search_query.lower() or \
                        '{} {}'.format(search_query.lower(), end_string).lower() == result['name'].lower():
                    collection_id = int(result['id'])

I don't know how to get them to match though.

x1ao4 commented 7 months ago

I added some code to print the comparison content between the search query and the returned collection name on the console. The code is as follows:

            for result in tmdb_data['results']:
                comparison1 = result['name'].lower()
                comparison2 = '{} {}'.format(search_query.lower(), end_string).lower()
                Log.Debug('Comparing: {} and {}'.format(comparison1, comparison2))  # 添加的日志输出
                if comparison1 == comparison2:
                    collection_id = int(result['id'])

I found that one of the values includes the language suffix and the collection in the comparison, which caused the matching to fail. The log is as follows:

2024-03-18 01:53:55,667 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 安娜贝尔(系列) and 安娜贝尔(系列)&language=zh-cn collection
2024-03-18 01:53:55,681 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 宝贝老板(系列) and 宝贝老板(系列)&language=zh-cn collection
2024-03-18 01:53:55,693 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 比得兔(系列) and 比得兔(系列)&language=zh-cn collection
2024-03-18 01:53:55,706 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠:黑暗骑士(系列) and 蝙蝠侠:黑暗骑士(系列)&language=zh-cn collection
2024-03-18 01:53:55,706 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠:黑暗骑士归来(系列) and 蝙蝠侠:黑暗骑士(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 新蝙蝠侠(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 未来蝙蝠侠(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠之子(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠无极限(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,718 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠:黑暗骑士(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,719 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠动画宇宙(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,719 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠:漫长的万圣节(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,719 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 超人与蝙蝠侠动画(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,719 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠:黑暗骑士归来(系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,719 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 蝙蝠侠(亚当·韦斯特动画系列) and 蝙蝠侠(系列)&language=zh-cn collection
2024-03-18 01:53:55,731 (700006dd8000) :  DEBUG (tmdb_helper:124) - Comparing: 边境杀手(系列) and 边境杀手(系列)&language=zh-cn collection

I'm not sure at which step the &language=zh-cn collection part was added. They need to be removed from the comparison value before being compared, then the matching should succeed.


I'm not sure why you defined end_string and added it to the comparison. It seems like we don't need end_string.

        end_string = 'Collection'  # collection names on themoviedb end with 'Collection'

After removing end_string, only &language=zh-cn remains, which is included in the search_query. It needs to be removed from the search_query before comparison.


You might be adding "Collection" to Plex collection titles that don't already contain it to match the collection titles on TMDB. However, besides English, I'm not sure if other languages also use the "Collection" suffix. In my case, my collection titles already include the Chinese version of "Collection" (系列), so there's no need to add "Collection" as an end_string. You may need to consider the situation in other languages.

    >>> get_tmdb_id_from_collection(search_query='James Bond')
    645
ReenigneArcher commented 7 months ago

I'm not sure why you defined end_string and added it to the comparison. It seems like we don't need end_string.

It's because in english the collections end with that on TMDB. They may or may not do that on Plex, depending on the agent that is used. The new movie agent will just use "James Bond" for example, but the legacy agents will use "James Bond Collection".

ReenigneArcher commented 7 months ago

They need to be removed from the comparison value before being compared

Good catch. I made a small adjustment to strip the query back to the search term, and use that for the comparison.