ZeroQI / Hama.bundle

Plex HTTP Anidb Metadata Agent (HAMA)
GNU General Public License v3.0
1.19k stars 110 forks source link

anime-titles.xml parsing error #302

Closed LaaZa closed 5 years ago

LaaZa commented 5 years ago

Hama did not work and caused "Error parsing content" in plex.

Had the folloving error in hama:

File "/usr/lib/plexmediaserver/Resources/Plug-ins-4610c6e8d/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1007, in _search
    agent.search(*f_args, **f_kwargs)
  File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/__init__.py", line 151, in search
    def search (self, results,  media, lang, manual):  Search (results,  media, lang, manual, False)
  File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/__init__.py", line 100, in Search
    if movie or max(map(int, media.seasons.keys()))<=1:  maxi, n =         AniDB.Search(results, media, lang, manual, movie)
  File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/AniDB.py", line 45, in Search
    for element in AniDBTitlesDB.xpath(u"/animetitles/anime/title[text()[contains(lower-case(.), '%s')]]" % orig_title.lower().replace("'", " ")):
AttributeError: 'str' object has no attribute 'xpath'

Turns out AniDBTitlesDB was not parsed into an xml object. LoadFile() does not detect anime-titles.xml to be an xml file because it is loaded as anime-titles.xml.gz and .gz is removed without it actually being decompressed. LoadFile() cannot see the <?xml header in the compressed file.

Manually gunzipping the file made it work again, but obviously this needs to be fixed in the code. I'm not sure what HTTP in the code actually is, but I assume it has at some point worked like the requests library that should automatically decompress gzipped content. For some reason now it doesn't.

EndOfLine369 commented 5 years ago

@ZeroQI, thoughts?

On a Plex server that have been running the same HAMA and Plex version, this only now seems to be happening. Seems AniDB might have done something to their encoding?

start() 
--> AniDB.GetAniDBTitlesDB() 
--> common.LoadFile(filename='anime-titles.xml', relativeDirectory="AniDB", ...) 
--> HTTP.Request()

Plex's HTTP.Request() decompresses the stream and HAMA will store it at AniDB/anime-titles.xml.

EX from "PMS Plugin Logs/com.plexapp.agents.hama.log" which logs the startup output:

2019-04-30 21:22:36,487 (7f3a1c9b0700) :  DEBUG (networking:166) - Requesting 'http://anidb.net/api/anime-titles.xml.gz'
2019-04-30 21:22:37,731 (7f3a1c9b0700) :  DEBUG (sandbox:19) - Completed 'http://anidb.net/api/anime-titles.xml.gz'
2019-04-30 21:22:37,739 (7f3a1c9b0700) :  INFO (sandbox:19) - common.SaveFile() - CachePath: '.../PlexMediaServer/Library/Plex Media Server/Plug-in Support/Data/com.plexapp.agents.hama/DataItems', file: 'AniDB/anime-titles.xml'
2019-04-30 21:22:37,740 (7f3a1c9b0700) :  WARNING (data:179) - Error decoding with simplejson, using demjson instead (this will cause a performance hit) - Expecting value: line 1 column 1 (char 0)
2019-04-30 21:22:37,745 (7f3a1c9b0700) :  INFO (core:611) - Started plug-in

Also seen having issues from Error decoding with simplejson so Plex is failing to decode the stream? I get the same error when I try to search and that variable is hit.

(most recent call last):
  File ".../PlexMediaServer/Resources/Plug-ins-9311f93fd/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1007, in _search
    agent.search(*f_args, **f_kwargs)
  File ".../PlexMediaServer/Library/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/__init__.py", line 151, in search
    def search (self, results,  media, lang, manual):  Search (results,  media, lang, manual, False)
  File ".../PlexMediaServer/Library/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/__init__.py", line 100, in Search
    if movie or max(map(int, media.seasons.keys()))<=1:  maxi, n =         AniDB.Search(results, media, lang, manual, movie)
  File ".../PlexMediaServer/Library/Plex Media Server/Plug-ins/Hama.bundle/Contents/Code/AniDB.py", line 45, in Search
    for element in AniDBTitlesDB.xpath(u"/animetitles/anime/title[text()[contains(lower-case(.), '%s')]]" % orig_title.lower().replace("'", " ")):
AttributeError: 'str' object has no attribute 'xpath'

This is also seen in the *.agent-search.log file from that crashed search.

--- full title ----------------------------------------------------------------------------------------------------------------------------------------------
len AniDBTitlesDB: 1067365

That is actually the size of the file on the filesystem.

LaaZa commented 5 years ago

It's just that the gzipping is no longer decompressed by the request(), parsing shouldn't be an issue, especially with json as it should be xml. LoadFile() does not return xml object but a content string which is basically binary as it is compressed, that's why xpath does not work.

vyressi commented 5 years ago

Am experiencing this too, anime-titles.xml was actually a .gz file, once I unzipped it and restarted PMS it started behaving as normal, prior to that it was throwing 404 errors on the log.

vyressi commented 5 years ago

To add to this, experienced the behavior after I upgraded to PMS 1.15.4.993, and matching was working fine if I appended the tag to the folder ( e.g. [anidb-xxxx] ) but if there was no tag and it was trying to match by name it would instantly return no results, it wasn't even trying to ungzip the file.

LaaZa commented 5 years ago

Really seems to me like HTTP in plex has changed it's behavior with the automatic decompressing.

vyressi commented 5 years ago

One more thing that I also noticed, I couldn't change the poster of something in my hama library by adding a URL until I unzipped the XML file, it would immediately say there was a problem downloading it. Using a local file was fine but you couldn't use a valid URL.

EndOfLine369 commented 5 years ago

This looks to be a change in the compression in AniDB for just this 'anime-titles.xml.gz' file.

As everything from the AniDB site is compressed and there are no issues on the still compressed data pulls for series xml info that use this same function common.LoadFile(filename=AniDBid+".xml" ...).

Note: I have not updated my Plex install version from when it was working to now so its not a change in HTTP.Request() functionality.

EndOfLine369 commented 5 years ago

They also only messed with their file after 4/26 as I have a Plex instance with a cached version of the file that is fine from that date.

EndOfLine369 commented 5 years ago

I've put in a fix into the master that will handle this.

If you've already have a compressed version in Plug-in Support/Data/com.plexapp.agents.hama/DataItems/AniDB/anime-titles.xml, you will have to manually remove the file for HAMA to repull/decompress/save to that cached file.

EndOfLine369 commented 5 years ago

Actually, I think this all ties down to probably the file size as I see no notes on the AniDB site of compression changes.

Maybe HTTP won't decompress if >1mb. If I compress the good cached instance's file, it is just under 1mb while the live file is just above >1mb. 1009K --> Did Decompress 1042K --> Stayed compressed and we have this issue

LaaZa commented 5 years ago

Thanks. I'm thinking that they have this particular file double zipped and the Request() unzips it only once. If you open it in browser it will not show xml even though the browser should decompress it. Would be a bit weird if it's size limited as it's pretty normal for sites to use compression.

EndOfLine369 commented 5 years ago

HTTP is a Plex provided class for their internal caching. So I could easily see them putting in some logic in their end to not try and decompress and store the decompressed instance in their cache based solely on the initial compressed size. As you would have to decompress, check the size is under your limit, and then potentially throw it out. Easier for them to just pass on the compressed response and let the agent deal with it and they don't have to worry about their cache size getting too large.

LaaZa commented 5 years ago

Yeah, true. I forgot they had caching on that. But why doesn't a browser decompress it properly either?

EndOfLine369 commented 5 years ago

No idea on browser behavior. Could be due to the filename itself having '.gz' that the browser just doesn't try.

And note, it does not seem to be double compressed. Plain wget http://anidb.net/api/anime-titles.xml.gz & gunzip. And its a plain file. So that would indeed mean HTTP is not doing any decompression on just this file. Because as mentioned, the anidb.xml pulls are also compressed and being pulled/decompressed just fine with HTTP.Request() alone. Meaning its something specific about that file. Which is why I believe its due to its file size (1043kb) and HTTP.Request() skipping decompression so they don't have to try store an unknown size decompressed file.

LaaZa commented 5 years ago

With wget using -S --header="accept-encoding: gzip" I had to gunzip twice.

EndOfLine369 commented 5 years ago

Interesting

letthiswork1 commented 5 years ago

Could this issue be affecting the scanner all together?

I recently did a fresh install of a server i had been running for the best part of 6 years and my Hama scanner seems to scan but not get much information and does not load any of the artwork.

Same happens with the AniDB scanner.

If i use TVDB it works fine.

imf4ce commented 5 years ago

@letthiswork1 I had the same issue. I manually downloaded and replaced the file in Plug-in Support/Data/com.plexapp.agents.hama/DataItems/AniDB/anime-titles.xml to fix it.

letthiswork1 commented 5 years ago

@letthiswork1 I had the same issue. I manually downloaded and replaced the file in Plug-in Support/Data/com.plexapp.agents.hama/DataItems/AniDB/anime-titles.xml to fix it.

Did you download an older version?

ZeroQI commented 5 years ago

@EndOfLine369 thanks a lot for the fix. if a bad file is there, it should get replaced after some time Is the issue resolved with the fix on master ? if deleting the cached file, works refreshing meta afterwards without error?

imf4ce commented 5 years ago

@letthiswork1 No, I got the latest from http://anidb.net/api/anime-titles.xml.gz, gunzipped, and replaced the file. The original file did not seem to be an xml file. This was the error in the logs: XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

@ZeroQI The fix seems to have worked. I deleted the file, updated to current master and it grabbed a new one. Refreshing metadata/matching to anidb works as expected.

letthiswork1 commented 5 years ago

Thanks @imf4ce that worked