ZeroQI / Hama.bundle

Plex HTTP Anidb Metadata Agent (HAMA)
GNU General Public License v3.0
1.19k stars 110 forks source link

anidb-titles.xml > 5MB #419

Closed SgtZapper closed 3 years ago

SgtZapper commented 3 years ago

If the anidb-titles.xml is larger than 5MB as it is now will trip the exeption in AniDB.py for GetAniDBTitlesDB when trying to load the file. It will also delete the anidb-titles.xml

I Tried to add some trace logs to common.py to see where it bombs but I can find nothing in the logs.

From what I can see the probable cause is that XML.ElementFromURL and XML.ElementFromString has a limit of 5MB, there was some mention of that on the plexforums from my googling.

But other than that the documentation on how that works in plex eludes me and I'm not good with python.

I tried shaving down the anidb-titles.xml just under 5MB and it started working again.

ZeroQI commented 3 years ago

That is a bad bug, thanks for reporting. This is a function from Plex for metadata agents to use, so either plex fixes it after being notified, or we have to code a replacement... It is not normal for the file to be deleted if the new one is no good we might have to review that logic, to let more time to fix the rest...

SgtZapper commented 3 years ago

Am I misunderstanding something but but when I added some Log.Info traces in common.py for LoadFile shouldn't they show up in Logs/PMS Plugin Logs/com.plexapp.agents.hama.log, traces I added to AniDB.py showed up fine.

The only other thing that could be related was a single row in the Plex Media Server.log saying Aug 29, 2020 13:10:45.939 [0x7f77deffd700] ERROR - Error parsing content.

ZeroQI commented 3 years ago

There is logs redirection so it could get tricky, if not there, it should be in the series specific *.agent.log in agent data folders>logs

EndOfLine369 commented 3 years ago

It is not normal for the file to be deleted if the new one is no good we might have to review that logic, to let more time to fix the rest...

It does not get overridden from this.

EX:

2020-08-29 09:01:00,742 (7f49e547d700) :  DEBUG (networking:143) - Requesting 'https://anidb.net/api/anime-titles.xml.gz'
2020-08-29 09:01:02,312 (7f49e547d700) :  DEBUG (sandbox:19) - Decompression times: 1
2020-08-29 09:01:02,312 (7f49e547d700) :  DEBUG (sandbox:19) - Downloaded URL 'https://anidb.net/api/anime-titles.xml.gz'
2020-08-29 09:01:02,353 (7f49e547d700) :  INFO (sandbox:19) - XML still corrupted after normalization
2020-08-29 09:01:02,359 (7f49e547d700) :  ERROR (sandbox:19) - ERROR: common.LoadFile() - File received but failed validity, file: "<?xml version="1.0" encoding="UTF-8"?>
<animetitles>
...
</animetitles>

<!-- Created: Sat Aug 29 03:00:04 2020 (12703 anime, 67364 titles) -->
"

From below code where SaveFile() is not called.

    if file_downloaded:
      file_downloaded_object = ObjectFromFile(file_downloaded)
      if not file_downloaded_object:                         Log.Error('common.LoadFile() - File received but failed validity, file: "{}"'.format(file_downloaded))
      elif url.endswith('.xml') and len(file_downloaded)<24: Log.Error('common.LoadFile() - File received too small (<24 bytes), file: "{}"'.format(file_downloaded))
      elif file_downloaded.startswith("<error"):             Log.Error('common.LoadFile() - Error response received, file: "{}"'.format(file_downloaded));  return file_downloaded_object
      else:                                                  SaveFile(filename, file_downloaded, relativeDirectory);  return file_downloaded_object

  return file_object

and will return file_object from LoadFileCache(). You're just out of luck if you've manually deleted your cache or are on a fresh install with no previous cache.

Fails from ObjectFromFile() -> XML.ElementFromString()

def ObjectFromFile(file=""):
  file = decompress(file)

  #TEXT file
  if isinstance(file, basestring):

    #XML
    if file.startswith('<?xml '):  #if type(file).__name__ == '_Element' or isinstance(file, basestring) and file.startswith('<?xml '):
      try:     return XML.ElementFromString(file)
      except:  
        try:   return XML.ElementFromString(file.decode('utf-8','ignore').replace('\b', '').encode("utf-8"))
        except:
          Log.Info("XML still corrupted after normalization"); return
...

Exception thrown:

2020-08-29 10:14:49,062 (7f6d434c4700) :  INFO (sandbox:19) - XML corrupted. Exception: (2103, 'Data of size 5244991 is greater than the maximum size 5242880')
2020-08-29 10:14:49,101 (7f6d434c4700) :  INFO (sandbox:19) - XML still corrupted after normalization. Exception: (2103, 'Data of size 5244991 is greater than the maximum size 5242880')
EndOfLine369 commented 3 years ago

Fixed

ZeroQI commented 3 years ago

@EndOfLine369 Excellent fix

SgtZapper commented 3 years ago

Out of a personal curiosity, when I shaved the anime-titles.xml and it started working again all the Log.Info traces I added to common.py showed up in the hama log. Why didn't they when it was broken, I even added a Log.Info on top of LoadFile that did not show up.

Is there a reason for that ?

This was all i got: 2020-08-29 13:10:45,918 (7fb589647f40) : CRITICAL (sandbox:303) - Exception when calling function 'Start' (most recent call last): File "/usr/lib/plexmediaserver/Resources/Plug-ins-a78fef9a9/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/code/sandbox.py", line 294, in call_named> result = f(*args, **kwargs) File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/__init__.py", line 82, in Start AniDB.GetAniDBTitlesDB() File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/AniDB.py", line 379, in GetAniDBTitlesDB if not AniDBTitlesDB: raise Exception("Failed to load core file '{url}'".format(url=os.path.splitext(os.path.basename(ANIDB_TITLES))[0])) Exception: Failed to load core file 'anime-titles.xml'

EndOfLine369 commented 3 years ago

No idea. Don't know what code you tried to touch and how you tried to modify it. Did you make sure to check the .1->.5 rotated files?

SgtZapper commented 3 years ago

Did you make sure to check the .1->.5 rotated files?

Yes I did.

I just edited the common.py directly in my PMSinstall.

I tested it by copying a valid anime-titles.xml albeit to large to its correct location. restarted PMS and tried to do a match. None of the logging that was supposed to be in common.py ever showed up.

But it did show up when I put a smaller anime-titles.xml in its place.

Later I will pull the fixed version and se what happens when I put the large version back.

Thanks for the quick fix.