beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.74k stars 1.82k forks source link

web: Crash when serializing objects with bytes fields #2532

Closed ocelotsloth closed 7 years ago

ocelotsloth commented 7 years ago

Problem

For some reason the web interface is not working correctly. I can't actually get it to search for anything. When you type something into the search box nothing happens except for the following text being printed to the terminal.

Running this command in verbose (-vv) mode:

$ beet -vv web

Then searching for something in the web interface gives this problem:

~
➔ beet -vv web
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [30/Apr/2017 11:05:15] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [30/Apr/2017 11:05:22] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 209, in run_wsgi
    execute(self.server.app)
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 199, in execute
    for data in application_iter:
  File "/usr/lib/python3.6/site-packages/werkzeug/wsgi.py", line 704, in __next__
    return self._next()
  File "/usr/lib/python3.6/site-packages/werkzeug/wrappers.py", line 81, in _iter_encoded
    for item in iterable:
  File "/usr/lib/python3.6/site-packages/beetsplug/web/__init__.py", line 74, in json_generator
    yield json.dumps(_rep(item, expand=expand))
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable

Setup

My configuration (output of beet config) is:

directory: ~/Music
library: ~/Music/musicLibrary.db

import:
    move: yes

plugins: chroma web
acoustid:
    apikey: REDACTED
web:
    host: 127.0.0.1
    port: 8337
    cors: ''
chroma:
    auto: yes
sampsyo commented 7 years ago

Hmm; thanks for reporting! I think we may have fixed some similar issues in the web plugin recently. Can I ask you the favor of trying out the latest source? (There are instructions for running the latest source in our FAQ.)

Thanks again!

ocelotsloth commented 7 years ago

Sure! Let me pull that down

ocelotsloth commented 7 years ago

So I tried it with the beets-git AUR package and got the same:

~/projects
➔ beet version
beets version 1.4.4
Python version 3.6.0
plugins: chroma, web

~/projects
➔ beet web    
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [01/May/2017 18:29:47] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/beets.css HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/jquery.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/underscore.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/beets.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/backbone.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:48] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [01/May/2017 18:29:52] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 209, in run_wsgi
    execute(self.server.app)
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 199, in execute
    for data in application_iter:
  File "/usr/lib/python3.6/site-packages/werkzeug/wsgi.py", line 704, in __next__
    return self._next()
  File "/usr/lib/python3.6/site-packages/werkzeug/wrappers.py", line 81, in _iter_encoded
    for item in iterable:
  File "/usr/lib/python3.6/site-packages/beetsplug/web/__init__.py", line 77, in json_generator
    yield json.dumps(_rep(item, expand=expand))
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable

I'm going to check and make sure that just pulling from master doesn't give different results.

ocelotsloth commented 7 years ago

Yep, more of the same:

~/projects/beets on master
➔ ./beet web
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [01/May/2017 18:33:32] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:33:35] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):

It's the same so I'll not repeat the rest.

ocelotsloth commented 7 years ago

Alright, so I've taken an attempt to try and learn a bit more, but I'm running up against the limits of my understanding of the internals of the library.

Adding a print(item.__repr__()) statement to the item that is trying to be Serialized yields something like this:

Item(id=677, path=b"/home/mstengle/Music/Compilations/Monstercat 016_ Expedition/06 What's Going On.mp3", album_id=49, title="What's Going On", artist='Droptek feat. Isabel Higuero', artist_sort='Droptek feat. Higuero, Isabel', artist_credit='Droptek feat. Isabel Higuero', album='Monstercat 016: Expedition', albumartist='Various Artists', albumartist_sort='Various Artists', albumartist_credit='Various Artists', genre='', composer='', grouping='', year=2014, month=2, day=26, track=6, tracktotal=32, disc=1, disctotal=1, lyrics='', comments='Visit http://music.monstercat.com', bpm=0, comp=1, mb_trackid='ce2d7b40-7997-4a5c-9b98-bf3129c64fb4', mb_albumid='9d4384da-6624-4feb-949a-27d55fa95dd7', mb_artistid='cfd48b9d-acc9-46cb-b275-6f6ac7cb9028', mb_albumartistid='89ad4ac3-39f7-470e-963a-56509c546377', albumtype='compilation', label='Monstercat', acoustid_fingerprint='LONG STUFF', acoustid_id='00b14796-049f-4b42-abe6-1beab37cadd1', mb_releasegroupid='1433d6c9-8263-4c91-876e-7b1ee7a85d43', asin='B00ILEX1YE', catalognum='MC016', script='Latn', language='eng', country='XW', albumstatus='Official', media='Digital Media', albumdisambig='', disctitle='', encoder='', rg_track_gain=None, rg_track_peak=None, rg_album_gain=None, rg_album_peak=None, original_year=2014, original_month=2, original_day=26, initial_key=None, length=205.9557142857143, bitrate=128044, format='MP3', samplerate=44100, bitdepth=0, channels=2, mtime=1493583618.0, added=1493583615.9340143, data_source='MusicBrainz')

I think this is the important item:

path=b"/home/mstengle/Music/Compilations/Monstercat 016_ Expedition/06 What's Going On.mp3"

I went through to the library.py file for beets to see what that data type is, but it just takes me to the PathType and PathQuery classes.

I would try to find a way to add a .decode("utf_8") (not sure if it is correct to assume this encoding), but I can't figure out where that would go.

Is this the right direction to go in to fix it? I don't mind digging a bit further to fix it myself but I'm not sure where the sql call get's turned into an actual object that I can manipulate before it gets out to the JSON serializer.

sampsyo commented 7 years ago

Thanks for continuing to investigate! Yes, I agree that paths are probably related to the problem, but we actually remove the path field by default (and if the paths are enabled, we decode them to stings): https://github.com/beetbox/beets/blob/84febb13c1c8e6c7dcc305b2498d9e5f55b370b7/beetsplug/web/__init__.py#L41

It's strange that I can't reproduce this error… maybe it would be useful to try doing a print(out) just before the return from _rep? That should tell us which bytes values are still remaining.

ocelotsloth commented 7 years ago

Ah, I see now. Thanks for pointing that out to me. It does look like it is taking the path out:

#### TEST: Post `_rep` `__repr__()`
{'id': 384, 'album_id': 34, 'title': 'Bridge Burning', 'artist': 'Foo Fighters', 'artist_sort': 'Foo Fighters', 'artist_credit': 'Foo Fighters', 'album': 'Wasting Light', 'albumartist': 'Foo Fighters', 'albumartist_sort': 'Foo Fighters', 'albumartist_credit': 'Foo Fighters', 'genre': 'Rock', 'lyricist': '', 'composer': 'Foo Fighters', 'arranger': '', 'grouping': '', 'year': 2011, 'month': 4, 'day': 8, 'track': 1, 'tracktotal': 11, 'disc': 1, 'disctotal': 1, 'lyrics': '', 'comments': '', 'bpm': 0, 'comp': 0, 'mb_trackid': 'd319e25c-d2ae-49dd-9103-c0819cd8877d', 'mb_albumid': 'ba3c72d8-b62e-4dff-ab19-b1bbaabf6924', 'mb_artistid': '67f66c07-6e61-4026-ade5-7e782fad3a5d', 'mb_albumartistid': '67f66c07-6e61-4026-ade5-7e782fad3a5d', 'albumtype': 'album', 'label': 'RCA', 'acoustid_fingerprint': b'SUPER LONG', 'acoustid_id': '03f11f8d-5881-4bf3-9ad0-dfcc89538b98', 'mb_releasegroupid': '91825bcf-e51d-4357-986c-ac2a49b4d205', 'asin': 'B004S4AU9A', 'catalognum': '88697872772', 'script': 'Latn', 'language': 'eng', 'country': 'GB', 'albumstatus': 'Official', 'media': 'CD', 'albumdisambig': 'UK & Europe jewel case release', 'disctitle': '', 'encoder': '', 'rg_track_gain': -10.45, 'rg_track_peak': 0.999969, 'rg_album_gain': 0.999969, 'rg_album_peak': None, 'original_year': 2011, 'original_month': 4, 'original_day': 8, 'initial_key': None, 'length': 286.5342403628118, 'bitrate': 256000, 'format': 'AAC', 'samplerate': 44100, 'bitdepth': 16, 'channels': 2, 'mtime': 1493583207.0, 'added': 1493583207.532419, 'data_source': 'MusicBrainz', 'size': 10384504}
############

And hey! I figured it out! It's the acoustid_fingerprint variable. If you don't have this setup and enabled and have generated fingerprints that would explain why you can't reproduce it.

If you want to deal with this in place with the _rep function, I can write a patch for it.

For now I tested just adding a del command for that and confirmed that it works on my end.

It might be worthwhile for that patch to include some kind of framework additional plugins can hook into that declares dangerous values like that. That would make it so that you don't have to hardcode it in place here for all the addins which may or may not be present on a person's particular configuration. Just a thought, and I'm open to implementing and documenting it as a project.

ocelotsloth commented 7 years ago

With a bit more tinkering I found that even with the chroma extension disabled it still picks those values up, so it would need to be a more global configuration schema than just having an extension checking in with the global conf.

I still think it would be a bit heavy handed an approach to manually add any bytes or other non realizable data tags individually by hand. Might be better to scan through and delete any bytes tags by default but allow an extension to override the behaviour if the end user has that extension loaded.

sampsyo commented 7 years ago

Got it; thanks for nailing it down!

I think the right solution is indeed to fix _rep. An easy way to prevent this from coming up again would be a simple type test: check whether isinstance(value, bytes) to filter out values like this. We have basically two choices:

ocelotsloth commented 7 years ago

That PR should fix the problem by decoding the byte values to strings. I'm not sure if it's correctly using base64 as you suggested, but I'm also not sure how to do that if it's not already in that form.