beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.72k stars 1.81k forks source link

Item objects are missing flexible attributes on reimports via file path #4537

Open JOJ0 opened 1 year ago

JOJ0 commented 1 year ago

While working on PR https://github.com/beetbox/beets/pull/4474#-, I realised that an item on a reimport does not look the same when

Very rudimentary investigation revealed that the initialization of an Item object on a reimport via an already imported media file using its file path takes place here: https://github.com/beetbox/beets/blob/master/beets/library.py#L601-L608 No loading of flexible attributes from the database is done at this point and it's obvious that the Item will be missing some information that it actually already has saved in the beets library.

Should we consider this as a bug?

With above mentioned PR's new feature it is a problem, it is refered to in this post as "problem number 1": https://github.com/beetbox/beets/pull/4474#discussion_r1006443079 and somewhere earlier when it first came up as the second of the two bulletpoints at the bottom of this post: https://github.com/beetbox/beets/pull/4474#discussion_r1002661226

On a side note and as described in the subsequent post in the mentioned PR's conversation, an item that has such an attribute appplied to it's file tag as well as to a flexible or fixed attribute, it will be initialized containing this field during a reimport, which kind of works around the problem. https://github.com/beetbox/beets/pull/4474#discussion_r1002915811

sampsyo commented 1 year ago

Hmm; that's worrying! We actually added specific support for preserving flexattrs across reimports here: https://github.com/beetbox/beets/blob/bcfc86df04f247e5a812e4940adfd7aeedb6b3b4/beets/importer.py#L810

This came from PR https://github.com/beetbox/beets/pull/984. I'm unclear on why this isn't working correctly for you… any chance you could share a verbose log?

JOJ0 commented 1 year ago

Hi, first of all, thanks for the quick reply! Yep I came across reimport_metadata() already and gitlens was pointing me to that PR.

I added a debug print in autotag.apply_metadata to really make it clear what I meant by "differnce between library- and path-reimports" - it might be that I'm using something wrong - but I was supposing I could access the same information on each variant of an reimport.

On the initial import of the file I added a flexattr named testfield1 with the value of foo.

Accessing testfield1 in autotag.apply_metadata works fine on a "library-reimport":

(beets)    ~/git/beets   master ●  beet -v import -L robert hood alarm
user configuration: /home/jojo/.config/beets/config.yaml
data directory: /home/jojo/.config/beets
plugin paths: 
Sending event: pluginload
library database: /home/jojo/.config/beets/library.db
library directory: /home/jojo/Sync/beets_test_music
Sending event: library_opened
Sending event: import_begin
yielding album 37: Robert Hood - Eleven / Alarm
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag
Tagging Robert Hood - Eleven / Alarm
Searching for discovered album ID: 79ef4094-55db-4fbb-889a-7807ab692494
Requesting MusicBrainz release 79ef4094-55db-4fbb-889a-7807ab692494
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: Robert Hood - Eleven / Alarm (79ef4094-55db-4fbb-889a-7807ab692494)
Computing track assignment...
...done.
Success. Distance: 0.00
Album ID match recommendation is Recommendation.strong
ID match.

/home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag (2 items)
Sending event: import_task_before_choice
Sending event: before_choose_candidate
Tagging:
    Robert Hood - Eleven / Alarm
URL:
    https://musicbrainz.org/release/79ef4094-55db-4fbb-889a-7807ab692494
(Similarity: 100.0%) (Digital Media, 2013, US, M-Plant, M.PM18)
 * Eleven
 * Alarm
Sending event: import_task_choice

We are in autotag.apply_metadata,
and this is item.testfield1:
foo

We are in autotag.apply_metadata,
and this is item.testfield1:
foo

Sending event: import_task_apply
Replacing item 89: /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/01 - Robert Hood - Eleven - Gm,Abm.m4a
Sending event: database_change
Sending event: item_removed
Replacing item 90: /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/02 - Robert Hood - Alarm - Am.m4a
Sending event: database_change
Sending event: database_change
Sending event: album_removed
Sending event: item_removed
2 of 2 items replaced
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Reimported album: added 1666850974.6932638, flexible attributes ['data_source', 'testfield1', 'data_source', 'testfield1'] from album 37 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag
Reimported item added 1666850974.6951199 from item 89 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/01 - Robert Hood - Eleven - Gm,Abm.m4a
Reimported item flexible attributes ['track_alt', 'testfield1', 'data_source', 'spotify_album_id', 'track_alt', 'testfield1', 'data_source', 'spotify_album_id'] from item 89 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/01 - Robert Hood - Eleven - Gm,Abm.m4a
Sending event: database_change
Reimported item added 1666850974.697611 from item 90 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/02 - Robert Hood - Alarm - Am.m4a
Reimported item flexible attributes ['track_alt', 'testfield1', 'data_source', 'spotify_album_id', 'track_alt', 'testfield1', 'data_source', 'spotify_album_id'] from item 90 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/02 - Robert Hood - Alarm - Am.m4a
Sending event: database_change
Sending event: write
Sending event: after_write
Sending event: write
Sending event: after_write
Sending event: database_change
Sending event: database_change
Sending event: import_task_files
Sending event: album_imported
Sending event: import
Sending event: cli_exit
(beets)    ~/git/beets   master ●  

When reimporting via the file path in autotag.apply_metadata the item does not have testfield1 available:

(beets)    ~/git/beets   master ●  beet -v import ~/Sync/beets_test_music/Robert\ Hood\ -\ Eleven\ \[M.PM18\]\ \(2013\)\ ALAC\ mp3tag 
user configuration: /home/jojo/.config/beets/config.yaml
data directory: /home/jojo/.config/beets
plugin paths: 
Sending event: pluginload
library database: /home/jojo/.config/beets/library.db
library directory: /home/jojo/Sync/beets_test_music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag
Tagging Robert Hood - Eleven / Alarm
Searching for discovered album ID: 79ef4094-55db-4fbb-889a-7807ab692494
Requesting MusicBrainz release 79ef4094-55db-4fbb-889a-7807ab692494
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: Robert Hood - Eleven / Alarm (79ef4094-55db-4fbb-889a-7807ab692494)
Computing track assignment...
...done.
Success. Distance: 0.00
Album ID match recommendation is Recommendation.strong
ID match.

/home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag (2 items)
Sending event: import_task_before_choice
Sending event: before_choose_candidate
Tagging:
    Robert Hood - Eleven / Alarm
URL:
    https://musicbrainz.org/release/79ef4094-55db-4fbb-889a-7807ab692494
(Similarity: 100.0%) (Digital Media, 2013, US, M-Plant, M.PM18)
 * Eleven
 * Alarm
Sending event: import_task_choice

We are in autotag.apply_metadata,
and this is item.testfield1:
Traceback (most recent call last):
  File "/home/jojo/git/beets/beets/dbcore/db.py", line 483, in __getattr__
    return self[key]
  File "/home/jojo/git/beets/beets/library.py", line 633, in __getitem__
    return super().__getitem__(key)
  File "/home/jojo/git/beets/beets/dbcore/db.py", line 390, in __getitem__
    return self._get(key, raise_=True)
  File "/home/jojo/git/beets/beets/dbcore/db.py", line 380, in _get
    raise KeyError(key)
KeyError: 'testfield1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jojo/.pyenv/versions/beets/bin/beet", line 33, in <module>
    sys.exit(load_entry_point('beets', 'console_scripts', 'beet')())
  File "/home/jojo/git/beets/beets/ui/__init__.py", line 1304, in main
    _raw_main(args)
  File "/home/jojo/git/beets/beets/ui/__init__.py", line 1291, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/home/jojo/git/beets/beets/ui/commands.py", line 1034, in import_func
    import_files(lib, paths, query)
  File "/home/jojo/git/beets/beets/ui/commands.py", line 974, in import_files
    session.run()
  File "/home/jojo/git/beets/beets/importer.py", line 340, in run
    pl.run_parallel(QUEUE_SIZE)
  File "/home/jojo/git/beets/beets/util/pipeline.py", line 446, in run_parallel
    raise exc_info[1].with_traceback(exc_info[2])
  File "/home/jojo/git/beets/beets/util/pipeline.py", line 311, in run
    out = self.coro.send(msg)
  File "/home/jojo/git/beets/beets/util/pipeline.py", line 170, in coro
    task = func(*(args + (task,)))
  File "/home/jojo/git/beets/beets/importer.py", line 1463, in user_query
    apply_choice(session, task)
  File "/home/jojo/git/beets/beets/importer.py", line 1531, in apply_choice
    task.apply_metadata()
  File "/home/jojo/git/beets/beets/importer.py", line 557, in apply_metadata
    autotag.apply_metadata(self.match.info, self.match.mapping)
  File "/home/jojo/git/beets/beets/autotag/__init__.py", line 188, in apply_metadata
    print(item.testfield1)
  File "/home/jojo/git/beets/beets/dbcore/db.py", line 485, in __getattr__
    raise AttributeError(f'no such field {key!r}')
AttributeError: no such field 'testfield1'
(beets)      ~/git/beets   master ●  

To remind where I'm coming from, this is what I'm trying to do (album_info was enriched with additional information fetched from MusicBrainz in mb.py, which I would like to finally apply here): https://github.com/beetbox/beets/pull/4474/files#diff-d7fd21d10763849afe82e1fd737c0c2caa65e34614457d873227bcfe4f77238a The if not .... conditions work perfectly fine on "library-reimports" but they don't on "path-reimports" - the fields are just not existing here: https://github.com/beetbox/beets/pull/4474/files#diff-d7fd21d10763849afe82e1fd737c0c2caa65e34614457d873227bcfe4f77238aR199

sampsyo commented 1 year ago

Hmm; I'm not seeing the relevant log message that indicates a reimport: https://github.com/beetbox/beets/blob/bcfc86df04f247e5a812e4940adfd7aeedb6b3b4/beets/importer.py#L822-L823

It's possible we need -vv here to see that log message (rather than just -v).

It sure is interesting that this works when using a library query (beet import -L) but not when importing from paths! It would be good to look into where that replacement is being detected, which is here: https://github.com/beetbox/beets/blob/bcfc86df04f247e5a812e4940adfd7aeedb6b3b4/beets/importer.py#L789

That's supposed to work regardless of how we selected the files to import…

JOJ0 commented 1 year ago

You might have overlooked, log lines are there:

First output:

Reimported album: added 1666850974.6932638, flexible attributes ['data_source', 'testfield1', 'data_source', 'testfield1'] from album 37 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag
Reimported item added 1666850974.6951199 from item 89 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/01 - Robert Hood - Eleven - Gm,Abm.m4a
Reimported item flexible attributes ['track_alt', 'testfield1', 'data_source', 'spotify_album_id', 'track_alt', 'testfield1', 'data_source', 'spotify_album_id'] from item 89 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/01 - Robert Hood - Eleven - Gm,Abm.m4a
Sending event: database_change
Reimported item added 1666850974.697611 from item 90 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/02 - Robert Hood - Alarm - Am.m4a
Reimported item flexible attributes ['track_alt', 'testfield1', 'data_source', 'spotify_album_id', 'track_alt', 'testfield1', 'data_source', 'spotify_album_id'] from item 90 for /home/jojo/Sync/beets_test_music/Robert Hood - Eleven [M.PM18] (2013) ALAC mp3tag/02 - Robert Hood - Alarm - Am.m4a

In the sescond output beets crashes already because of the missing testfield1 before it comes to that point (debug print is in autotag.apply_metadata)

sampsyo commented 1 year ago

Ah, thanks for pointing that out! Not sure why ⌘F failed me there.

So the root cause here, it seems, is that restoring the the flexible attributes (reimport_metadata) happens when we add the object to the database—not before. On the other hand, you're hooking into apply_metadata, which happens right before we do that. The relevant code is here: https://github.com/beetbox/beets/blob/e201dd4fe57b0aa2e80890dc3939b0a803e3448d/beets/importer.py#L1529-L1534

So the problem here is that your changes to apply_metadata expect the target objects to already have the relevant information on them, and it's not there yet—whereas everything else in that function just goes right ahead and applies all the relevant metadata.

Maybe this is a silly question, but is there a strong need for the "don't override existing IDs" default? We don't do that for any other fields, so it seems a little odd to make an exception just for these specific fields.

JOJ0 commented 1 year ago

Thanks again for helping me read code and understand how things work! Very much appreciated! I have to admit that sometimes it's a lot to analyze how things work and at the same time trying to read old PR's discussions to find out why things where implemented that way. So, again: Thanks for taking the time to help out!

To answer your last question - not a silly one at all! I was asking myself the same thing when I found out that it is rather difficult to achieve a comparision and other attributes don't have that. The idea of not overwriting existing metadata source ID's actually came up over there in that PR's discussions and was suggested by @arsaboo. I do agree with him that it is not the best idea to let's say overwrite the Discogs release ID with what MusicBrainz provides, when reimporting an album that was previously tagged with the Discogs plugin. On the other hand I would rather prefer finishing a PR than trying to reinvent the (beets)wheel (;-). For me personally it would be perfectly fine if a reimport in said situation would overwrite the Discogs ID, because from working a lot with MusicBrainz during the past year, I trust that their information is accurate! I don't know about the other external ID's they sometimes have, like Spotify, Deezer and so on, but from Discogs I can say: No worries, it will be fine! If it's not, I will send them a correction anyway.

Anyway, moving on, some future thoughts: Looking into the "wrapper" of apply_metdata keeps me thinking that if one would like to implement a feature that is double checking before overwriting, it could be tried to be put at the end of that method when things are in the database already, similar to what set_fields does. Not sure if there would be other problems, didn't think it through really.... https://github.com/beetbox/beets/blob/e201dd4fe57b0aa2e80890dc3939b0a803e3448d/beets/importer.py#L1522-L1542

Also I was thinking that with the current state of how things work, it would be rather challenging to implement this feature ("Making it entirely configurable which fields to import, overwrite, etc."): https://github.com/beetbox/beets/issues/4087#issuecomment-932901710

But as you pointed out, it might be around the set_fields code and it could replace that part in the end.

JOJ0 commented 1 year ago

Also, and please correct me if I'm wrong: Isn't that very much a related topic? https://github.com/beetbox/beets/discussions/4288#discussioncomment-2299757

In any case, @wisp3rwind pointed out very nicely and human readable what exactly is applied to which object's attributes while impoting. Nice! Thanks!

sampsyo commented 1 year ago

Awesome. What do you think about this tactic: let's simplify the first version of this PR to not do the overwrite-avoiding thing, and then address that shortcoming in a subsequent PR? Perhaps with a more general approach that allows this sort of thing in a more general way, as you suggested.

stale[bot] commented 1 year ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

JOJ0 commented 1 year ago

Let's keep it open.