Consistently Source Director/Writer Metadata for Series

KingJ commented 7 years ago

Currently, depending on if a series has been matched in standard mode or multi-episode mode (i.e. tvdb2/3/4 mode), different sources of metadata will be used for each episode's Director and Writer. In standard mode, it will be sourced from AniDB and in multi-episode mode it will be sourced from the TVDB.

We should be consistent in which source is used. Each site has a different approach for storing the metadata - AniDB stores it per-show and TVDB stores it per-episode. If AniDB is used as the metadata source, this means that if a show has different Directors or Writers for certain episodes, all episodes will have all potential Directors and Writers listed - when really only the Director or Writer who did that episode should be added.

As TVDB separates the data down to a per-episode level, i'd prefer to update the standard mode code to source the Director/Writer metadata from TVDB instead of AniDB as is currently done.

Any thoughts on the 'right' approach here?

ZeroQI commented 7 years ago

Seem like already loaded TVDB XML has all info and Seems non specials are indeed populated adequately, for example: http://thetvdb.com/api/A27AD9BE0DA63333/series/78920/all/en.xml

<Director>Hiroaki Gohda</Director>
<GuestStars>Koichi Sakaguchi|Mugihito</GuestStars>
<Writer>Kunihiko Kondo| Nahoko Hasegawa</Writer>

line 320 about, tvdb_table [numbering] does contain the directors list and writers list already

line 413 seem like the handling is allready coded if a serie (not a movie) and 2 seasons OR1 season highter than season 1 OR source is thetvdb. This will need modding if not movie and (len(media.seasons)>2 or max(map(int, media.seasons.keys()))>1 or metadata_id_source_core == "tvdb"):

line 508-529 will need commenting to disable anidb writer producer section

Very early code. https://github.com/ZeroQI/Hama.bundle/compare/master...ZeroQI-patch-1?quick_pull=1

line 387 it starts doing the metadata replacement, starting with tvdb. we need to set which database sets the following: (one or the other, or selection in agent settings)

title (anidb)
ep title (tvdb?)
ep summary (tvdb only has it)
genre (anidb?)
start date
ratings
content ratings
Cast (voice actors anidb, producer writer tvdb)

KingJ commented 7 years ago

I took the code you started working on and continued it. Lines 508-529 are for movies, so we should leave them as-is - I restored them. The episode code starts at Line 605. As we need to map from the AniDB episode to the TVDB episode, I completely removed the existing code and started building new code inside the TVDB summary section, since the TVDB episode number is already mapped there via tvdb_ep. Line 628 onwards is the new code I wrote. I've tested this with episodes that had no directors/writers, ones that had one or the other and ones that had multiple of each and it works well 😄 . Here's an example of the log that's produced;

2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - AniDB episode title: 'The Princess of Colchis'*
2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - AniDB AirDate '2015-03-28'*
2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - AniDB duration: '1500000'*
2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - url: 'http://thetvdb.com/banners/episodes/278626/5176629.jpg', num: '1', filename: 'TVDB/episodes/5176629.jpg'*
2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - TVDB mapping episode summary - anidb_ep: 's1e2', tvdb_ep: 's1e14', season: '1', epNumVal: '2', defaulttvdbseason: '1', title: 'The Princess of Colchis', summary: 'At his home, Shirō tells Rin that he plans on cont'
2016-12-13 22:58:16,832 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - Processing writers and directors for Episode.
2016-12-13 22:58:16,834 - com.plexapp.agents.hama (2b74) : INFO (logkit/Info:16) - Episode has 1 directors and 2 writers.
2016-12-13 22:58:16,834 - com.plexapp.agents.hama (2b74) : DEBUG (logkit/Debug:13) - Adding new Director Kenichi Takeshita
2016-12-13 22:58:16,834 - com.plexapp.agents.hama (2b74) : DEBUG (logkit/Debug:13) - Adding new Writer Kinoko Nasu
2016-12-13 22:58:16,834 - com.plexapp.agents.hama (2b74) : DEBUG (logkit/Debug:13) - Adding new Writer Kristi Reed

Regarding the last bit of your comment, i'm wondering if we should start a large scale restructuring of the code, breaking each metadata & source combination in to a separate function (e.g. fetch_anidb_title), along with a few helper functions to lookup metadata IDs using a different metadata ID (e.g. get_imdbid_by_anidbid). If we split up the fetching like that, it would then be much easier for us to code support for letting users choose in the agent settings what metadata sources they'd like to use and make the code base a bit easier for us to follow too.

ZeroQI commented 7 years ago

Looks good to me :D but does it work with single season anime with anidb guid assigned?

###TVDB mode when a season 2 or more exist 
    if not movie and (len(media.seasons)>2 or max(map(int, media.seasons.keys()))>1 or metadata_id_source_core == "tvdb"):

Dingmatt commented 7 years ago

@KingJ @ZeroQI

Regarding the last bit of your comment, i'm wondering if we should start a large scale restructuring of the code, breaking each metadata & source combination in to a separate function (e.g. fetch_anidb_title), along with a few helper functions to lookup metadata IDs using a different metadata ID (e.g. get_imdbid_by_anidbid).

Just to give you the heads up that I'm actually doing exactly that over in v.0.1.4 of my AMSA project (work in progress) and as we still share the same base datasets feel free to grab any of the code if it helps.

ZeroQI commented 7 years ago

Thanks for that. The multitasking part need separate functions i believe, these could be necessary We therefore need to identify which function can be multitasked and externalised in functions but i would like for logs to remain grouped for single series, so maybe externalise only poster and pictures download (should be mostly there already)

For the major overall you are coding i am unsure as your base init.py code is longer than mine despite all the stuff removed. While that is more structure, it is way longer to go through all code like i can now nad some part are too complex for me to follow (was self taught python so i lack some knowledge at times)..

Dingmatt commented 7 years ago

@ZeroQI Makes sense though I've got a feeling you've taken a look at the master rather than the v0.1.4 branch, that branch is the start of a complete recode with very segmented functions that might be of use to you.

tl;dr: Ignore the master, it's all getting replaced with fresh code soon.

ZeroQI commented 7 years ago

@Dingmatt and you are correct. I am eager to see if your re-coding and if multitasking improves performance in the end and how much. You can spot if english dubbed or subbed for collections? nice... I can see your version 0.1.4 is recoding from scratch with proper structure but the backbone is there, but the meta is not being updated yet as it's a W.I.P...

Dingmatt commented 7 years ago

@ZeroQI Correct, I'll be adding in the AniDB meta over the next few days followed by TvDB; I'm just finalizing a few idea of how to handle the data efficiently to reduce any slowdown.

KingJ commented 7 years ago

@ZeroQI That codeblock should already work, when I updated it previously to use the new Plex Director/Writer objects, it was already pulling the data from TVDB. I've not tested it recently though, so i'll double check that tonight.

@Dingmatt @ZeroQI What's the long term plan for both ASMA and HAMA? Do we want to use AMSA as a 'next-gen' HAMA with threading/recoding etc? Do we want both going in parallel and implement features from each other? I'm happy to help out with either/both projects!

ZeroQI commented 7 years ago

@Dingmatt you could remove the caching but i quite like it as anidb used to ban a lot and i didn't want to re-download posters more than once. It also allows to paste cache on top of each other... The down side is the need to create the folders by hand... I will continue updating the agent in the meantime but if you get significant performance improvement i will strongly consider migrating to your model. I have added you as a collaborator but please ask me or people active in the code before big changes (like would be you complete rewrite upload)

Also please let me know if you need anything changed in the scanner and i will do, so you can focus on HAMA. I struggle now with HAMA a bit, as there is many database sources possible, osme of which people don't use much (TMDB, TSDB which is the serie part of themoviedb, or rather what i call it dto differentiate...)

@KingJ We need all bugs ironed out in hama, and the code can be reused in parts for ASMA. I will maintain the scanner since i am not overwhelmed with that. While i wasn't originally a big fan of the fork to be fair, he is taking it in a direction i cannot at the moment, that could be beneficial, but yet and soon to be proven. I have added him as collaborator so he can actually also change hama code, since he most likely reuse it in part for ASMA as i believe in software communism (or is that open source ?).

@Dingmatt @KingJ Is that fair enough, to spare time and effort and achieve the best thing for the users? I can call you from work (UK TIME 9:00 to 17:30 mon to friday) if you need assistance in understanding some choices or bits of code. Message me in forums.plex.com and we can schedule a callback/conf call if needed).

KingJ commented 7 years ago

I agree, caching is good as it's very easy to get banned from AniDB. With Plex restricting file system access though it's really difficult to implement the storage for it - i've seen a lot of Hama users who haven't created the necessary folders and get confused when things aren't working. One potential alternative would be to set up an intermediate caching API endpoint, just like Plex does for TVDB. I've got plenty of spare bandwidth on my server(s) so i'm happy to set up something like that - but i'd probably need to ask AniDB to whitelist it. Alternatively if it's possible to use native Python I/O functions from Plex's sandbox it might be possible to use something like the requests module with a cache plugin. I need to have a look at how some of the other plugins for Plex handle file I/O, pretty sure i've seen them writing things outside of the Plugin Support folder.

I think I have a reasonable understanding of the agent, but i'm happy to join a call if people want to. I'm UK based, so that time window works for me. I know very little about the scanner though so if you're happy to keep maintaining it I think that would work well.

Dingmatt commented 7 years ago

@KingJ @ZeroQI I'm flattered by the show of confidence but lets slow things down a little, AMSA was never intended to supplant HAMA, it was simply a customization for my personal server which took things in a slightly different direction; when it was stable enough I simply released the code as part of the GPL 3 licence and to help out anyone who was attempting similar changes.

Now that being said I've got no issue with combining the code bases in the future as everyone would benefit though as @ZeroQI is aware AMSA lacks a few of HAMA's features, namely the ability to fetch metadata for movies and the forcing of specific metadata conventions (i.e. this series uses AniDB but the next uses TvDB); it was designed for output the be consistent regardless of the input, I'll have to take time to add these if we were considering porting it back to HAMA.

Finally as far as caching goes our opinions don't differ too much and whilst AMSA currently only caches data for a maximum of three days I could easily open that up as a in Plex setting (i.e. "Cache data for x days") where zero would mean indefinite (though as newer shows metadata can change weekly I wouldn't advise it... maybe I can make it dynamic based on the show airdate?); also AMSA (or atleast the 0.1.4 branch) doesn't require the directories to be present in the support areas as it can create any it needs on the fly for caching purposes. It also creates its own offline metadata maps for each show which theoretically could be shared with anyone and would mean individual servers wouldn't have to actually access AniDB or TvDB unless they wanted to force refresh something (yet to decide if I want to go down this path yet).

Anyway for the time being let me continue to work on AMSA and get it working as I'd expect, once I've got its re-code stable and tested on my 1500+ library we can think about the future, in the meantime I'm happy to help incorporate any code in HAMA if you see something to desperately like.

KingJ commented 7 years ago

@Dingmatt Sorry, I probably got a little bit over-excited!

AniDB's guidelines on caching mentions that requesting the same dataset more than once a day can lead to a ban. So a cache period of a day should provide a good trade-off between freshness of data and preventing bans.

Having a look through your code, it looks like you're just using the standard os library to read and write files/dirs? I would have thought Plex's sandbox would have prevented that, but if that is usable that certainly makes creating the required directories easier, as you have done.

I have no control over either project, but what i'd like to see (and be more than happy to help out with!) is splitting up the code in to more modular functions so that it's easier to understand the code flow, easier to re-use bits of code (like metadata ID look ups), easier to add new metadata agents and easier to implement user prefs (e.g. user wants titles from anidb? Great, call the get_title_from_anidb function. Want it from TVDB? Great, call get_title_from_tvdb etc). I appreciate that it will be a pretty big undertaking though, and even if it's just a case of moving existing code out in to separate functions it will take some time. I am happy to drive a lot of that work, I just don't want to start spending a lot of time on it if that's not what the respective owners of each project want.

Dingmatt commented 7 years ago

@KingJ Didn't mean to put a damper on your enthusiasm, as I said I'm all for pooling resources as it also leads to better solutions. All I meant to imply would be that it would be dangerous to start introducing large swaths on new code into an established project like HAMA unless we're certain they would work out.

I'm also up for creating modular functions however these need to be balanced against the overhead of parsing the datasets, we want to reduce to amount of times we need to access these in order to reduce the matching times. With that in mind I've been focusing on creating a "map" file for each series which is stored in the cache, its essentially a XML document populated with all the formatted data from each source offered by the agent, once this file is created the agent never needs to access the raw dataset's / websites again (unless you force a refresh) for that particular series.

So in a nutshell my idea would be to create these "map" files then have the various modular functions access it for there information, in there current form think of them as the the AniDB api result, TvDB api result, ScudLee mapping data and titles all mixed into one; minus the info we're don't want.

Oh I've also played around with code to automatically download the OP / ED of any series missing those specials however I've yet to get past a fundamental issue in the version of python available in the Plex sandbox so we'll have to see how that goes.

ZeroQI commented 7 years ago

@Dingmatt that is perfect communication, as it allows to know where we are going. Implement your added functionalities and i look forward to them, and to see if performance improves as if it does, i will implement into HAMA. I am for modular function IF it shortens code down or is needed for multi-threading. The source is probably peculiar as i like being able to see much code on the same screen, but it works for me so far, am not the greatest at coding but i think it is miles away from where i took it from...

I do think eps comes once a week, and i cache XMLs so i can reuse them if the new is not accessible (site down, banned, etc...) on top of Plex own cache, as if user re-create their libraries, the load is lessened and it prevented anidb bans pretty effectively to be honest. I am unsure about you map-file approach though. i will move possibly the scudlee's custom mapping file to the library root so the scanner can access it, as i do not see the scanner going to find the agent data folder easilly across many OS...

I will fix Hama with any issues. Thinks i have seen your take have:

collection grouping for dubbed/subbed
multi-threading
auto download op/ED could be nice

KingJ commented 7 years ago

@Dingmatt No, no, not at all - enthusiasm is still very much here but pausing to consider the best approach is the right thing to do. The map file is an interesting idea, it could definitely simplify bits of code that need to grab metadata but equally generating that map file and also keeping it fresh could be a challenge.

Dingmatt commented 7 years ago

@KingJ It won't be as hard as you think, we're already parsing the information out of the raw api returns the only difference is we're not currently storing it; the idea's kinda like caching but at a higher level.

Lets say someone adding an episode of an anime per week, at the moment both our processes are pretty much processing the whole show each time a new episode appears (i.e. calling the api, parsing the data, mapping the episodes, etc). The idea of this map file (lets call it a bundle) is to provide as much information as possible to reduce that process, leading to a process where only the new information is processed thus saving time.

Basically when the agent searching for data relating to the episode it'll try the series bundle first (map file), followed by the cache, followed by calling the api; when the information is retrieved it'll update the "parent" (i.e. the cache and bundle) so next time there's less for the agent to do.

It should speed up the update process whilst also keeping all the relevant info to hand if the user ends up changing any of there agent settings.

ZeroQI commented 7 years ago

@Dingmatt KingJ Current Hama priority is:

xml: plex cache (3 days)> live file > local xml (saved on every cached or live access since i cannot differentiate). Actually every XML (anidb, tvdb and posters) is stored in agent data folder
jpg: local xml > live file (then saved forever)

Currently i am redownloading hte whole XML which is small enough We should only call metadata for episodes not having metadata with the plex quick refresh,and will look into doing that. Full refresh should update all, and will be happy enough with this behavior

ZeroQI commented 7 years ago

@KingJ where do we stand? are writers/directors pulled on a ep basis from thetvdb for both multi season and anidb guid single season? if so can we close?I am loosing track...

I am contemplating moving meta source their dedicated file which allow functions to use multitasking but all need to be working beforehand, if multitasking does improve performance

Meta explanation from Dingmatt here: https://forums.plex.tv/discussion/comment/1306365/#Comment_1306365

@Dingmatt about the mapping file it's very good, knowing the original got no update the last 2 months... Should i host it and put you as contributor or use yours directly, deusxanime would like to have access btw... i coded custom mapping file in serie folder and take precedence over ScudLee's one and would put yours in between in order of priority https://raw.githubusercontent.com/Dingmatt/AMSA/master/Plug-in%20Support/Data/com.plexapp.agents.amsa/DataItems/anime-list-corrections.xml

Have added the loading of the correction file together with scudlee one's when hama starts https://github.com/ZeroQI/Hama.bundle/blob/master/Contents/Code/__init__.py

Dingmatt commented 7 years ago

@ZeroQI I'd say its best to host it as part of HAMA (or as a separate project) that way we won't need to swap any urls round and will give more people access to it, we can use the AMSA one as a base (though I haven't updated that since I started working on v0.1.4).

As you've mentioned the priority order should go:

Local Custom
Our Corrections
ScudLee's Latest

@KingJ In case your both wondering over the state of AMSA, I've been a little busy with actual work but am almost ready to start coding the TvDB & AniDB meta data extraction but before I do that I have to refactor my current code so its split up into nice neat functions

Here's an example of the bundle file I've previously mentioned (6662.bundle.xml.txt, rename extension to xml) generated by the new code (for the first season of Fairy Tail and without any metadata of course); the formats not complete but it will make processing the series a lot easier for any subsequent episodes.

ZeroQI commented 7 years ago

Added to ASS the mapping file to group it with tvdb4 ones: https://github.com/ZeroQI/Absolute-Series-Scanner/blob/master/anime-list-corrections.xml Added you both as collaborators to allow you to edit the file.

I thought a standalone project for the mapping would shed negative light on scudLee project which i respect hightly, while it wasn't updated for two mounths, however it was not updated 3 months at times in the past...

Dingmatt commented 7 years ago

@ZeroQI

I thought a standalone project for the mapping would shed negative light on scudLee project which i respect highly, while it wasn't updated for two months, however it was not updated 3 months at times in the past...

Agreed, without Scudlee we'd be dead in the water. We're only looking to introduce a mechanism where we can add our own interim updates.

sven-7 commented 7 years ago

@Dingmatt @ZeroQI -- what about naming it exactly that then? "SudLee Interim and Custom Mapping Project" or something along those lines. I think that way it's clear you are not trying to step on Scud's toes.

KingJ commented 7 years ago

@ZeroQI I had fixed the existing code to work with Plex's new method of storing writer/director metadata, but I hadn't made the code consistently use AniDB or TVDB as a source. There's a lot of discussion here about splitting bits in to individual functions and/or using a unified bundle file so I didn't want to create a PR if things were about to be changed shortly. That said, it is a fairly small change so i'll update the existing code to be consistent (i.e. use TVDB for Writer/Director metadata at all times) and put in a PR over the next few days.

@Dingmatt Thanks for linking the example bundle file, with that and your graphical examples that @ZeroQI linked to I have a better understanding of what you're trying to achieve now.

ScudLee's mapping is definitely an invaluable resource - it's clear just how much time and effort has gone in to both the initial creation and ongoing curation of it. I think having an intermediate mapping hosted by ASS/Hama is fine but it just needs to be clear that it's temporary/interim. Other programs also use ScudLee's mapping (e.g. ScudLee's own Kodi AniDB plugin and the Trakt for Plex plugin), so it's best to keep the original mapping as up to date as possible, but it can be difficult when there's a delay. I wonder if it's worth approaching ScudLee to see if he would be willing to add someone as a collaborator to help keep it up to date?

KingJ commented 7 years ago

FYI: ScudLee has updated the lists today with lots of PRs and added some contributors to the repo, I can now also update the lists going forward.

ZeroQI commented 7 years ago

Excellent

ZeroQI commented 7 years ago

Hi all, to let you all know, i am re-coding everything into modules: common, AnimeLists, AniDB, tvdb, tmdb, FanartTv, OMDb Main code down to 591 lines from 1100 about. That was the more or less easy part. Now need to split update() part per sections and that is tricky

Dingmatt commented 7 years ago

Yep, been doing the same on AMSA and it certainly takes time.

ZeroQI commented 7 years ago

source code 89 lines for init.py sorting the bugs from the move now and re-writing some functions trying to have neat separate functions. Here is the main function now:

# -*- coding: utf-8 -*-
### HTTP Anidb Metadata Agent (HAMA) By ZeroQI (Forked from Atomicstrawberry's v0.4 - AniDB, TVDB, AniDB mod agent for XBMC XML's ###
import os, re, time, datetime, string, thread, threading, urllib, copy, io # Functions used per module: os (read), re (sub, match), time (sleep), datetim (datetime).
from lxml import etree                                                     # fromstring
import common, AnimeLists, AniDB, tvdb, tmdb, FanartTv, OMDb               # Hama source splitted modules

### Pre-Defined Start function #########################################################################################################################################
def Start():
  Log.Info('### HTTP Anidb Metadata Agent (HAMA) Started ##############################################################################################################')
  HTTP.CacheTime = CACHE_1HOUR * 24 * 2  
  msgContainer   = common.ValidatePrefs()
  if msgContainer.header == 'Error': return

class HamaCommonAgent:
  Log.Info('### HTTP Anidb Metadata Agent (HAMA) Class Started ########################################################################################################')

  ### Serie search ######################################################################################################################################################
  def Search(self, results, media, lang, manual, movie):
    AniDB.Search_AniDB(self, results, media, lang, manual, movie)
    #if maxi<50:  Search_TVDB(self, results,  media, lang, manual, movie)
    #if len(results)>=1:  return
    #Search_TMDB(self, results, media, lang, manual, movie)

  ### Parse the AniDB anime title XML ##################################################################################################################################
  def Update(self, metadata, media, lang, force, movie):
    Log.Info('--- Update Begin -------------------------------------------------------------------------------------------')

    ### AniDB to TVDB mapping file (get tvdbid, mappingList, tmdbid, imdbid, +etc...) ###
    tvdbid, tmdbid, imdbid, defaulttvdbseason, mappingList, mapping_studio, anidbid_table, poster_id = "", "", "", "", {}, "", [], ""      # anidbTvdbMapping
    tvdbtitle, tvdbOverview, tvdbFirstAired, tvdbContentRating, tvdbNetwork, tvdbGenre, tvdbRating   = "", "", "", "", "", "", None            # get_tvdb_meta
    current_date, anidbid, tvdbanime                                                                 = int(time.strftime("%Y%m%d")), "", None  # Other variables to set

    error_log = { 'anime-list anidbid missing': [], 'anime-list tvdbid missing'  : [], 'anime-list studio logos'   : [],
              'AniDB summaries missing'   : [], 'AniDB posters missing'      : [], 
              'TVDB posters missing'      : [], 'TVDB season posters missing': [], 'Plex themes missing'       : [],
              'Missing Episodes'          : [], 'Missing Episode Summaries'  : [], 'Missing Specials'          : [], 'Missing Special Summaries'  : []  
            }

    ### Metadata souce, id separation ### #if not "-" in metadata.id:  metadata.id = "anidb-" + metadata.id  # Old metadata from when the id was only the anidbid
    metadata_id_source, metadata_id_number = metadata.id.split('-', 1)
    metadata_id_source_core                = metadata_id_source.rstrip("0123456789")  #replace with startswith("tvdb/anidb" ?
    Log.Info("metadata source: '%s', id: '%s', Title: '%s', lang: '%s', force: '%s', movie: '%s'" % (metadata_id_source, metadata_id_number, metadata.title, lang, force, movie))
    if metadata_id_source == "anidb":
      anidbid = metadata_id_number
      if anidbid.isdigit():  tvdbid, tmdbid, imdbid, defaulttvdbseason, mappingList, mapping_studio, anidbid_table, poster_id = AnimeLists.anidbTvdbMapping(metadata, media, movie, anidbid, error_log)  
      if imdbid and not tmdbid:              tmdbid = tmdb.get_tmdbid_per_imdbid(imdbid, tmdbid) # Used by fanart.tv
    elif metadata_id_source_core == "tvdb":  tvdbid = metadata_id_number
    #elif metadata_id_source == "tmdb":       tmdbid = metadata_id_number                         #in ["tmdb", "tsdb"]:
    #elif metadata_id_source == "tmdb":       tsdbid = metadata_id_number                         #tmdb.Update_TMDB(self, metadata, media, lang, force, movie)

    ### TVDB ID exists ####
    if tvdbid.isdigit():  tvdbanime, tvdbtitle, tvdbOverview, tvdbFirstAired, tvdbContentRating, tvdbNetwork, tvdbGenre, tvdbRating = tvdb.get_tvdb_meta(lang, tvdbid, imdbid)
    if tvdbanime:         tvdb_table = tvdb.tvdb_table_build(metadata, media, movie, defaulttvdbseason, tvdbanime, metadata_id_source, metadata_id_source_core, error_log)

    ### Posters ###  #if Prefs['GetAnidbPoster'], Prefs['GetFanartTVBanner'], Prefs['GetTvdbBanners']
    if (Prefs['GetTvdbPosters'   ] or Prefs['GetTvdbFanart' ]       ) and (tvdbid.isdigit()                    ):  tvdb.getImagesFromTVDB   (metadata, media, error_log, tvdbid, tvdbtitle, movie, poster_id, force, defaulttvdbseason, 1)
    if (Prefs['GetOmdbPoster'    ]                                  ) and (omdbid.isdigit()                    ):  omdb.omdb_poster         (metadata, imdbid)          ### OMDB - Posters - Using imdbid ###  return 200 but not downloaded correctly - IMDB has a single poster, downloading through OMDB xml, prefered by mapping file
    if (Prefs['GetTmdbPoster'    ] or Prefs['GetTmdbFanart'        ]) and (tmdbid.isdigit()                    ):  tmdb.tmdb_posters        (metadata, imdbid, tmdbid)  ### TMDB - background, Poster - using imdbid or tmdbid ### The Movie Database is least prefered by the mapping file, only when imdbid missing
    if (Prefs['GetFanartTVPoster'] or Prefs['GetFanartTVBackground']) and (tvdbid.isdigit() or tmdbid.isdigit()):  FanartTv.fanarttv_posters(metadata, movie, tmdbid, tvdbid, defaulttvdbseason )  ### fanart.tv - Background, Poster and Banner - Using imdbid ###
    if (Prefs['GetASSPosters'    ]                                  ) and (metadata_id_source == "tvdb4"       ):  common.getImagesFromASS    (metadata, media, tvdbid, movie, 0)  ### ASS tvdb4 ark posters ###
    if (Prefs['GetPlexThemes'    ]                                  ) and (tvdbid.isdigit()                    ):  tvdb.plex_theme_song     (metadata, tvdbid, tvdbtitle)  ### Plex - Plex Theme song - https://plexapp.zendesk.com/hc/en-us/articles/201178657-Current-TV-Themes ###

    if movie and tmdbid.isdigit():                                                           tmdb.tmdb_tagline        (metadata, movie, tmdbid)  ### Populate Movie Metadata Extras (e.g. Taglines) from TMDB for Movies ###
    if not movie and (max(media.seasons.iterkeys())>1 or metadata_id_source_core == "tvdb"):  tvdb.tvdb_update_meta(metadata, media, tvdbanime, tvdbtitle, tvdbOverview, tvdbFirstAired, tvdbContentRating, tvdbNetwork, tvdbGenre, tvdbRating, tvdb_table) ### TVDB mode when a season 2 or more exist ###
    elif metadata_id_source == "anidb":                                                      AniDB.anidb_update_meta(metadata, media, metadata_id_number)

    common.write_logs(media, movie, error_log, metadata_id_source_core, metadata_id_number, anidbid, tvdbid)
    Log.Info('--- Update end -------------------------------------------------------------------------------------------------')

### Agent declaration ###############################################################################################################################################
class HamaTVAgent(Agent.TV_Shows, HamaCommonAgent):
  name             = 'HamaTV'
  primary_provider = True
  fallback_agent   = False
  contributes_to   = None
  accepts_from     = ['com.plexapp.agents.localmedia'] # 'com.plexapp.agents.opensubtitles'
  languages        = [Locale.Language.English, 'fr', 'zh', 'sv', 'no', 'da', 'fi', 'nl', 'de', 'it', 'es', 'pl', 'hu', 'el', 'tr', 'ru', 'he', 'ja', 'pt', 'cs', 'ko', 'sl', 'hr']
  def search(self, results,  media, lang, manual): self.Search(results,  media, lang, manual, False )
  def update(self, metadata, media, lang, force ): self.Update(metadata, media, lang, force,  False )

class HamaMovieAgent(Agent.Movies, HamaCommonAgent):
  name             = 'HamaMovies'
  primary_provider = True
  fallback_agent   = False
  contributes_to   = None
  languages        = [Locale.Language.English, 'fr', 'zh', 'sv', 'no', 'da', 'fi', 'nl', 'de', 'it', 'es', 'pl', 'hu', 'el', 'tr', 'ru', 'he', 'ja', 'pt', 'cs', 'ko', 'sl', 'hr']
  accepts_from     = ['com.plexapp.agents.localmedia'] # 'com.plexapp.agents.opensubtitles'
  def search(self, results,  media, lang, manual): self.Search(results,  media, lang, manual, True )
  def update(self, metadata, media, lang, force ): self.Update(metadata, media, lang, force,  True )

ZeroQI commented 7 years ago

It moves forward slowly... reshaping every other function... Here is how main code looks now: https://github.com/ZeroQI/Hama.bundle/pull/114

Please review and give your opinion. still working on functions so they look neat and trying to modify metadata only if it changed for speed reasons.

ZeroQI commented 7 years ago

Please let me know what you think, i think it is cleaner and more maintainable that way. I will move variables to the functions that need it if only one present

KingJ commented 7 years ago

Sorry @ZeroQI, i've been away for a short while so only just catching up on things now. I took a quick look and it looks good but i'll go over it more in detail tomorrow and test a few things out!

ZeroQI commented 7 years ago

@Dingmatt I just understood some of your code, namely:

SaveFile which create folders in "plugin-support" folder (remove the specificity of agent data folders creation for HAMA)
LoadFile which loads if file recently cached (this is genius as Plex does not cache xml " (content type 'text/plain; charset=utf-8' not cacheable in Agent plug-ins)")
detecting subs/dubbed for collections (actually can't understand how to use, could you give me a 2 line example i could vamp off you?)

My OCD made me re-write them but your mods are truly inspired. separated path components, gave issues under linux for CachePath
To be honest i did think about using file time with the cache but didn't know it was possible to implement... Now the local cache is way better and xmls are cached

CachePath            = os.path.abspath(os.path.join(os.path.dirname(inspect.getfile(inspect.currentframe())), "..", "..", "..", "..", "Plug-in Support", "Data", "com.plexapp.agents.hama", "DataItems"))
### Save file in Plex Media Server\Plug-in Support\Data\com.plexapp.agents.hama\DataItems creating folder(s) ###
def SaveFile(file, filename="", relativeDirectory=""):  #By Dingmatt 
  fullpathDirectory = os.path.abspath(os.path.join(CachePath, relativeDirectory))
  relativeFilename  = os.path.join(relativeDirectory, filename) 
  if os.path.exists(fullpathDirectory):  Log.Debug("SaveFile() - CachePath: '{path}', file: '{file}', directory(ies) were present".format(path=CachePath, file=relativeFilename))
  else:                                  Log.Debug("SaveFile() - CachePath: '{path}', file: '{file}', directory(ies) were absent.".format(path=CachePath, file=relativeFilename)); os.makedirs(fullpathDirectory)
  Data.Save(relativeFilename, file)

### Load file in Plex Media Server\Plug-in Support\Data\com.plexapp.agents.hama\DataItems if cache time not passed ###
def LoadFile(filename="", relativeDirectory="", url="", cache= CACHE_1HOUR * 24 *2):  #By Dingmatt 
  relativeFilename  = os.path.join(relativeDirectory, filename) 
  fullpathFilename  = os.path.abspath(os.path.join(CachePath, relativeDirectory, filename))
  file              = None
  if relativeFilename and Data.Exists(relativeFilename) and os.path.isfile(fullpathFilename):       
    file_time = os.stat(fullpathFilename).st_mtime
    if file_time>(time.time()-cache):  Log.Debug("LoadFile() - Filename: '{file}', CacheTime: '{time}', Limit: '{limit}' loaded from cache".format(file=file, time=time.ctime(file_time), limit=time.ctime(time.time() - cache))); file = Data.Load(relativeFilename)
    else:                              Log.Debug("LoadFile() - Filename: '{file}', CacheTime: '{time}', Limit: '{limit}' needs reloading..".format(file=file, time=time.ctime(file_time), limit=time.ctime(time.time() - cache)))
  else:                                Log.Debug("LoadFile() - Filename: '{file}' does not exists in cache".format(file=fullpathFilename))
  if not file:
    try:                    file = str(HTTP.Request(url, headers={'Accept-Encoding':'gzip', 'content-type':'charset=utf8'}, timeout=20, cacheTime=cache))                                     # Loaded with Plex cache, str prevent AttributeError: 'HTTPRequest' object has no attribute 'find'
    except Exception as e:  file = None; Log.Warn("XML issue loading url: '%s', filename: '%s', Exception: '%s'" % (url, filename, e))                                                           # issue loading, but not AniDB banned as it returns "<error>Banned</error>"
    if file:
      Log.Debug("LoadFile() - url: '{url} loaded".format(url=url))
      SaveFile(file, filename, relativeDirectory)
  return XML.ElementFromString(file) if filename.endswith(".xml") else file

Dingmatt commented 7 years ago

@ZeroQI I'm glad it's useful to you.

As for detecting the subs / dubs you can do this by interrogating the media passed through to the update function, they can be found in (xpath) "media.seasons[x].episode[x].items.parts.streams", in there you'll find a number of streams with a 'type' and 'lang' attribute.

If the type is 'audio' and the lang is 'eng' it'll be a dub, if it 'audio' and 'jpn' along with another one with 'subtitle' and 'eng' its a sub.

in order to access the attributes you'll likely need to add the following line to the agents info.plist:

<key>PlexPluginCodePolicy</key> 
<string>Elevated</string>

Hope that helps.

ZeroQI commented 7 years ago

@KingJ uploaded this week modifications on pull request if you want to test changes:

info.plist modified to have always on for speed
self creating agent data folders
common.LoadFile() function using local cache if recent or if all else fails (no longer there, too small result if anidb banned). All is cached including json and it returns xml or json object if present
common.LoadFile() using thread lock and anidb anti ban delay(only if downloading on anidb) so super efficient normally, loading key xml that way, xmlfromstring function also deleted due to that.
put preliminary code to create sub/dubbed collection tag
reduced number of variable to move them in already existing dicts
getAniDBTitle was riddled with wrong results, corrected
code to detect if seasons>1 bugged as key was a string, corrected to:max(int(x) for x in media.seasons.keys())>1

need to look at which meta source to use primarily

AniDB: serie name, collections (related anime field) not many thing due to english only, poster (no movies nor hentai on thetvdb)...
TVDB: posters (season, background, series) since normalized, episode summaries, episode titles, episode directors.
tmdb: tagline

to do:

collection sub/dub tag and ended/continuing
multi threading to speed searches and downloads
more rewriting of code to separate things, especially for ep meta to be done always by thetvdb, right now anidb,py has bits of both...
trailers. maybe look at https://gist.github.com/jdpace/568e087cdea0387dd382
compare corrections to scudlee and write log if identical

Your opinion on the new code? more readable? Any help welcome, am saturating...

Dingmatt commented 7 years ago

@ZeroQI I'll be able to give you a hand at some point in the future but I've also got the AMSA rewrite on my plate (and RL work) which as you can attest is quite the handful, I feel like AMSA is becoming more and more like a test bed for tech but I don't like to leave things unfinished.

As you've move onto the AMSA style caching it might also be worth considering implementing some modified version on the AMSA 'CleanCache' function which is designed to remove cache items after a certain period of time (used to update the cache for currently airing shows or to reclaim disk-space from show which have been removed from Plex).

As far as AMSA goes most of the essential code is now in place, I've still got to code in a few 'shortcuts' to speed things up but those aside I'll be adding the code to actually populate Plex with the new metadata any day now; If your interested I've attached an example of its latest produced bundle format (Steins;Gate).

sven-7 commented 7 years ago

@Dingmatt -- with the 'CleanCache' function, does that mean for airing shows there could be something that cleans it on a more regular basis? For instance, sometimes I might add an episode before AniDB or TVDB has a Title, Description, etc. If I hit refresh, I'm usually stuck for a while before that info can be done (aside from the forced Full Refresh). Would that capability be possible here or is it already?

ZeroQI commented 7 years ago

@Dingmatt i really like the caching but not removing when cache time is elapsed so if you are banned or page is unavailable like tvdb is sometimes days at a time, all keeps working away...

Please check the "LoadFile" function, it returns xml and json and also reload old cache if all fails, allowing me to dispense with the load xml and load json functions...

I will pull in a tag "ended" that will make series not being able to expire the data, to avoid useless updates and improve speed. Do you think that is a good idea ?

Still going through your code, i can see the elegance of it, but sometimes i don't get the code, not too used to classes to be fair...

I like the format, it would allow to have offline files with all data, like an nfo but better organised, but I want to be able to make people update databases instead of fixing it for themselves only, which is why i did the html reporting in agent data folders

Dingmatt commented 7 years ago

@ZeroQI An ended tag is a good idea, I'm going to be doing something similar but the opposite way round.

As AMSA is going to be using the bundle files that it generates from the various TvDB, AniDB, Plex and now MAL sources it's got less dependency on the raw cached files downloaded from those sites; the bundle files will essentially be persistent (an indefinite cache) though I'll be flagging airing shows to ensure the agent refreshes it periodically for the first couple of months in order to replace any missing or updated data.

I believe that's probably now the biggest difference between the HAMA and AMSA model (excluding the ability to choose which source populates which info), the first thing AMSA does when it's passed a newly detected series is download all the information it can for that series (and related series, aka other seasons / specials; not just the episode it's detected); it then builds the bundle file which includes all that info.

The idea is that from that point forward it doesn't need to use TvDB, AniDB or any of the other sources as its got all the info stored locally, think of it predictive caching (caching data you might use in the future); though it takes an additional second or so to initially match a series it means that you save on all the transfer / processing time in the future; it'll be extremely useful for airing series which are been added episode by episode weekly.

Dingmatt commented 7 years ago

@sven-7 That's actually one of the key idea behind the 'CleanCache', cached data isn't always up to date especially (like you've stated) for currently airing shows; the idea would be to be able to 'flush' old data and replace it new in order to fix those kinds of issues; for AMSA I'm likely going to create a dynamic cache time base on how long since the episode / series aired, this will mean that newer series will update their data once a week or so whilst older shows will reduce to every 6 months (I'll likely make this configurable form the agent settings).

Dingmatt commented 7 years ago

@ZeroQI @KingJ @sven-7 Here's a slightly basic idea of how the metadata caching works within AMSA and the process used to decide we're to gather new data from, please remember that AMSA uses predictive caching for the bundle (i.e. if you add only one episode from one piece it'll add the metadata for all episodes to the bundle in preparation for future episodes, it will also process all that data so it's easily accessible in the future with no extra processing).

drawing1

sven-7 commented 7 years ago

@Dingmatt -- that's an awesome feature. I often find it frustrating to have "Episode X" listed as the title for a few days. Something I'm definitely interested in. I will often go update TVDB if the info isn't there. In my experience, it's not a long wait before the TVDB API's update -- it's the Plex side limitations that keep the update a few days from populating. Same with Posters.

Thanks for the explanation of the metadata. That visual is really helpful. How does this work: Episode 774 of One Piece gets added - I assume the bundle takes everything, but nothing for 775 is listed yet. When 775 gets added, does it go out and recall all the info?

Dingmatt commented 7 years ago

@sven-7 I'm actually in the process of finalising the code which handles those type of scenarios as we speak, here's the general idea.

The short answer is yes but not in the way your thinking. AMSA works with a number of different sources but the main being TvDB and AniDB, now for TvDB all the episode information is part of one big file, essentially if you get info for one episode your by default getting it for all of them (including specials and other seasons; for AniDB it works mostly the same way but there's no seasons (each being its own series) so instead you get back all the episode data for that series.

As part of AMSA's bundle file it has a 'mapping' section which it generates and uses to index the various episodes that it's gathered data on (think of it as a custom version of the scudlee files, parsed for speed); here's an example:

<Series>
  <Mapping>
    <Series anidbid="10887" absolute="False" tvdbid="244061" episodeoffset="0">
      <Episode tvdb="S00E03" anidb="1"/>
      <Episode tvdb="S00E04" anidb="2"/>
      <Episode tvdb="S00E05" anidb="3"/>
      <Episode tvdb="S00E06" anidb="4"/>
    </Series>
    <Series anidbid="8655" absolute="False" tvdbid="244061" episodeoffset="0">
      <Episode tvdb="S00E02" anidb="1"/>
etc...

When a new episode is added it checks this list to see if the bundle contains metadata for it (and that the metadata is 'in-date'), if it isn't found then the cache is checked and finally the api's called (if needed), once its got the new episode data it will save as much processing time as possible and simply add the new data to the existing bundle (rather than recreating the bundle from scratch), the exception to this is if you run a manual refresh (forced refresh) where it will rebuild the whole bundle (as it assumes you want to fix a caching issue).

Edit: The fact that all the episode information is returned for each call to the api / cache is actually one of the reasons I've moved onto predictive caching, its much faster to process all the information at once then simply load it later then having to process it each time a new episode is added.

ZeroQI commented 7 years ago

@Dingmatt Excellent documented replies :D currently HAMA download the serie xml, and cache the whole of it. To have episodes showing, plex need to rescan newly added files. even on plex cache, the file would have data for files added since as long as the source had it when cached, so we shouldn't see much difference if done right. i will include a series in progress tag to have more often scans and could check the files last upload date to force refresh if less than a week old and disabling plex cache since we manage on our own. I will also do this aproach to update what is necessary unless fully refreshing minus the global bundle for now... I redid and simplified the main page, going through modules now

KingJ commented 7 years ago

With this approach to caching, especially with potentially longer cache times, would there be a way for a user to override the cache and forcefully grab the most recent data? If older series are refreshed less often i'm imagining an example whereby someone updates a poorly-maintained old series on TVDB, AniDB etc but then can't actually get that updated information in to their Plex instance because the cache is still holding the old copy.

Dingmatt commented 7 years ago

@ZeroQI Very true, being able to properly cache now makes all the difference. The only real difference is that AMSA is preformatting the cached data (into bundles) whilst HAMA processes it from the raw cache though that's mainly because AMSA need to immediately react to setting changes (in regards to preferred data sources) whilst HAMA doesn't.

@KingJ Yes, that's a forced / periodic refresh. Plex by default can be set up to refresh local metadata every 3 days where it will call the update part of the agent, this call is still bound to the caching setting used by the agent (lets say HAMA) which might only update once every 6 months for older series; once the cache is outdated it'll update and retrieve the new info automatically. The user can also manually trigger this process by clicking the manual refresh button (either singular or for the library) which the agent can be coded to start the same process.

ZeroQI commented 7 years ago

@Dingmatt Hame will come close without the unified model for now but the code i am doing now i hope will serve some purpose to you. Uploaded recent changed files to pull request. init.py and common.py seem to have the content i wanted and less likelly to change. I need to overall the anidb and tvdb meta update functions to have proper generic function, anidb is a bit messy. Please tell me about the common and init pages, took a lot of time to reduce it that much, the FileLoad is beautiful and moded FileSave to be compatible with Data.Save for easy migration :D

Dingmatt commented 7 years ago

@ZeroQI They look good, I can see that you've effectively combined the download / load-local functions (AMSA was similar until i needed to call the load separately), I've also taken a brief look at the AniDB & TvDB modules and they've reminded me of a few bits of data i need to add to the latest AMSA.

sven-7 commented 7 years ago

@Dingmatt Ah, okay - that makes sense. A different way of going about it. Great feature.

I definitely agree with @KingJ about the making sure old shows get updated. I, for one, have definitely gone onto TVDB when I see a show that I know is missing information and update it.

Looking forward to all the updates.

ZeroQI commented 7 years ago

@Dingmatt too many options in DefaultPrefs.json so using a setting for get posters/seasons/fanarts/banners (terms are same length), one for gettign a single file of each (or not) and then a setting per meta source. using small module as template, created small one liners funcitons to reduce code:

def getElementText(el, xp):  return el.xpath(xp)[0].text if el and el.xpath(xp) and el.xpath(xp)[0].text else ""   ### Get attribute from xml tag - from common import getElementText to use without 'common.' pre-pended ###
def GetPosters:              return Prefs['Posters'] and not (Prefs['GetSingleOne'] and metadata_count['posters'])
def GetSeasons:              return Prefs['Seasons'] and not (Prefs['GetSingleOne'] and metadata_count['seasons'])
def GetFanarts:              return Prefs['Fanarts'] and not (Prefs['GetSingleOne'] and metadata_count['fanarts'])
def GetBanners:              return Prefs['Banners'] and not (Prefs['GetSingleOne'] and metadata_count['banners'])

Usage:

import common.GetPosters, common.GetSeasons, common.GetFanarts, common.GetBanners  # Same as from common import GetPosters
if GetPosters(): xxx

MyAnimeList cheats and include thetvdb links so added a check. New style image download function look like:

def GetImages (metadata, metadata_count, malid):
  MAL_HTTP_API_URL = "http://fribbtastic-api.net/fribbtastic-api/services/anime?id="
  MAL_PREFIX       = "https://myanimelist.cdn-dena.com"  # Some links in the XML will come from TheTVDB, dirty....
  Log.Info("MyAnimeList_posters() - malid: '%s'" % malid)
  if Prefs['MyAnimeList'] and ( GetPosters() or GetFanarts() or GetBanners()):  # Noo need to re-check "Prefs['MyAnimeList']" afterwards
    xml = common.LoadFile(filename=malid+".xml", relativeDirectory="MAL", url=MAL_HTTP_API_URL + malid, cache=CACHE_1HOUR * 24 * 7)
    if xml:
      for item in (xml.xpath('//anime/covers/cover'          ) if GetPosters()):  (metadata_download(metatype=metadata.posters, url=item.text, num=50, filename="MyAnimeList/" + "/".join(item.text.split('/')[3:], url_thumbnail=None, metadata_count['posters']) if item.text.startswith(MAL_PREFIX))
      for item in (xml.xpath('//anime/backgrounds/background') if GetFanarts()):  (metadata_download(metatype=metadata.art,     url=item.text, num=50, filename="MyAnimeList/" + "/".join(item.text.split('/')[3:], url_thumbnail=None, metadata_count['fanarts']) if item.text.startswith(MAL_PREFIX))
      for item in (xml.xpath('//anime/banners/banner'        ) if GetBanners()):  (metadata_download(metatype=metadata.banners, url=item.text, num=50, filename="MyAnimeList/" + "/".join(item.text.split('/')[3:], url_thumbnail=None, metadata_count['banners']) if item.text.startswith(MAL_PREFIX))

ZeroQI / Hama.bundle

Consistently Source Director/Writer Metadata for Series #102