beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.79k stars 1.82k forks source link

I can't get "import --from-scratch" to import "from scratch" #3706

Closed jaimet closed 3 years ago

jaimet commented 4 years ago

Problem

An (ID3 tag) frame contained inside the to-be-imported mp3 file remains in the file after it has been imported.

Here's the procedure:

Start with any mp3 file: $ wget -O test.mp3 "https://ccrma.stanford.edu/~jos/mp3/pno-cs.mp3"

Delete any (all) ID3 tags from the file: $ mid3v2 -D test.mp3

Check that the file contains no ID3 tags:

$ mid3v2 test.mp3
IDv2 tag info for test.mp3
No ID3 header found; skipping.

Add "COMM" (comment) ID3 frame with garbage content (this also adds the ID3v2.4.0 tag needed to contain the frame): $ mid3v2 -c wibble test.mp3

Import the mp3 as a track

$ beet -vv -l libraryTest.db import --from-scratch -C -t test.mp3
user configuration: /home/user/.config/beets/config.yaml
data directory: /home/user/.config/beets
plugin paths:
Sending event: pluginload
library database: /home/user/test/libraryTest.db
library directory: /home/user/Music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Tagging  -
No album ID found.
Search terms:  -
Album might be VA: True
Evaluating 0 candidates.

/home/user/test/test.mp3 (1 items)
Sending event: before_choose_candidate
No matching release found for 1 tracks.
For help, see: http://beets.readthedocs.org/en/latest/faq.html#nomatch
[S]kip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort? T
Sending event: import_task_choice
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Item search terms:  -
Found 0 candidates.

/home/user/test/test.mp3
Sending event: before_choose_candidate
No matching recordings found.
[S]kip, Use as-is, Enter search, enter Id, aBort? I
Enter recording ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Searching for track ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Sending event: trackinfo_received
Sending event: before_choose_candidate
Correcting track tags from:
     -
To:
    Gene Autry - Rudolph, the Red-Nosed Reindeer
URL:
    https://musicbrainz.org/recording/07227795-b0b8-4c73-b010-c96f73990dc4
(Similarity: 0.0%) (title, length)
Apply, More candidates, Skip, Use as-is, Enter search, enter Id, aBort? A
Sending event: import_task_choice
Sending event: import_task_apply
0 of 1 items replaced
Sending event: database_change
Sending event: database_change
Sending event: write
Sending event: after_write
Sending event: database_change
Sending event: import_task_files
Sending event: item_imported
Sending event: import
Sending event: cli_exit

But the garbage tag is still there:

$ mid3v2 test.mp3 | grep wibble
COMM==eng=wibble

Setup

$ beet version
beets version 1.4.7
Python version 3.7.3
no plugins loaded
$ beet config
{}
sampsyo commented 4 years ago

Hello! Maybe a good way to investigate this would be by looking at the song's metadata with beet info to see what happens with and without the option.

But because it seems like you're interested in how the actual on-disk tags get affected, I recommend you give the scrub plugin a try. It deletes all the tags from a file before writing any new ones.

jaimet commented 4 years ago

Hello! Maybe a good way to investigate this would be by looking at the song's metadata with beet info to see what happens with and without the option.

$ beet config
plugins: info

Firstly, without --from-scratch:

$ mid3v2 -D test.mp3
$ mid3v2 -c wibble test.mp3
$ rm ./libraryTest.db
$ beet -vv -l libraryTest.db import -C -t test.mp3
user configuration: /home/user/.config/beets/config.yaml
data directory: /home/user/.config/beets
plugin paths:
Sending event: pluginload
library database: /home/user/test/libraryTest.db
library directory: /home/user/Music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Tagging  -
No album ID found.
Search terms:  -
Album might be VA: True
Evaluating 0 candidates.

/home/user/test/test.mp3 (1 items)
Sending event: before_choose_candidate
No matching release found for 1 tracks.
For help, see: http://beets.readthedocs.org/en/latest/faq.html#nomatch
[S]kip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort? T
Sending event: import_task_choice
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Item search terms:  -
Found 0 candidates.

/home/user/test/test.mp3
Sending event: before_choose_candidate
No matching recordings found.
[S]kip, Use as-is, Enter search, enter Id, aBort? I
Enter recording ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Searching for track ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Sending event: trackinfo_received
Sending event: before_choose_candidate
Correcting track tags from:
     -
To:
    Gene Autry - Rudolph, the Red-Nosed Reindeer
URL:
    https://musicbrainz.org/recording/07227795-b0b8-4c73-b010-c96f73990dc4
(Similarity: 0.0%) (title, length)
Apply, More candidates, Skip, Use as-is, Enter search, enter Id, aBort? A
Sending event: import_task_choice
Sending event: import_task_apply
0 of 1 items replaced
Sending event: database_change
Sending event: database_change
Sending event: write
Sending event: after_write
Sending event: database_change
Sending event: import_task_files
Sending event: item_imported
Sending event: import
Sending event: cli_exit

$ beet info ./test.mp3
/home/user/test/test.mp3
       arranger:
            art: False
         artist: Gene Autry
  artist_credit: Gene Autry
    artist_sort: Autry, Gene
       bitdepth: 0
        bitrate: 128000
            bpm: 0
       channels: 2
       comments: wibble
           comp: False
           disc: 0
      disctotal: 0
         format: MP3
         genres:
         length: 20.062625
         lyrics:
    mb_artistid: 675b7627-6b5d-4a46-a728-785cb24a299e
     mb_trackid: 07227795-b0b8-4c73-b010-c96f73990dc4
  original_year: 0
r128_album_gain: 0
r128_track_gain: 0
     samplerate: 48000
          title: Rudolph, the Red-Nosed Reindeer
          track: 0
     tracktotal: 0
           year: 0

$ beet info ./test.mp3 | md5sum
04d332add5bba85c0f932850573a3260  -

Second time round, with --from-scratch:

$ mid3v2 -D test.mp3
$ mid3v2 -c wibble test.mp3
$ rm ./libraryTest.db
$ beet -vv -l libraryTest.db import --from-scratch -C -t test.mp3
user configuration: /home/user/.config/beets/config.yaml
data directory: /home/user/.config/beets
plugin paths:
Sending event: pluginload
library database: /home/user/test/libraryTest.db
library directory: /home/user/Music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Tagging  -
No album ID found.
Search terms:  -
Album might be VA: True
Evaluating 0 candidates.

/home/user/test/test.mp3 (1 items)
Sending event: before_choose_candidate
No matching release found for 1 tracks.
For help, see: http://beets.readthedocs.org/en/latest/faq.html#nomatch
[S]kip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort? T
Sending event: import_task_choice
Sending event: import_task_created
Sending event: import_task_start
Looking up: /home/user/test/test.mp3
Item search terms:  -
Found 0 candidates.

/home/user/test/test.mp3
Sending event: before_choose_candidate
No matching recordings found.
[S]kip, Use as-is, Enter search, enter Id, aBort? I
Enter recording ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Searching for track ID: 07227795-b0b8-4c73-b010-c96f73990dc4
Sending event: trackinfo_received
Sending event: before_choose_candidate
Correcting track tags from:
     -
To:
    Gene Autry - Rudolph, the Red-Nosed Reindeer
URL:
    https://musicbrainz.org/recording/07227795-b0b8-4c73-b010-c96f73990dc4
(Similarity: 0.0%) (title, length)
Apply, More candidates, Skip, Use as-is, Enter search, enter Id, aBort? A
Sending event: import_task_choice
Sending event: import_task_apply
0 of 1 items replaced
Sending event: database_change
Sending event: database_change
Sending event: write
Sending event: after_write
Sending event: database_change
Sending event: import_task_files
Sending event: item_imported
Sending event: import
Sending event: cli_exit

$ beet info ./test.mp3
/home/user/test/test.mp3
       arranger:
            art: False
         artist: Gene Autry
  artist_credit: Gene Autry
    artist_sort: Autry, Gene
       bitdepth: 0
        bitrate: 128000
            bpm: 0
       channels: 2
       comments: wibble
           comp: False
           disc: 0
      disctotal: 0
         format: MP3
         genres:
         length: 20.062625
         lyrics:
    mb_artistid: 675b7627-6b5d-4a46-a728-785cb24a299e
     mb_trackid: 07227795-b0b8-4c73-b010-c96f73990dc4
  original_year: 0
r128_album_gain: 0
r128_track_gain: 0
     samplerate: 48000
          title: Rudolph, the Red-Nosed Reindeer
          track: 0
     tracktotal: 0
           year: 0

$ beet info ./test.mp3 | md5sum
04d332add5bba85c0f932850573a3260  -

(That's the same md5sum as without --from-scratch)

But because it seems like you're interested in how the actual on-disk tags get affected, I recommend you give the scrub plugin a try. It deletes all the tags from a file before writing any new ones.

According to the documentation, that's what the --from-scratch option (or the from_scratch configuration option) is for:

  1. (From https://beets.readthedocs.io/en/v1.4.7/reference/cli.html#import)

When beets applies metadata to your music, it will retain the value of any existing tags that weren’t overwritten, and import them into the database. You may prefer to only use existing metadata for finding matches, and to erase it completely when new metadata is applied. You can enforce this behavior with the --from-scratch option, or the from_scratch configuration option.

  1. (From https://beets.readthedocs.io/en/v1.4.7/reference/config.html#from-scratch)

Either yes or no (default), controlling whether existing metadata is discarded when a match is applied. This corresponds to the --from_scratch flag to beet import.

  1. (From https://beets.readthedocs.io/en/v1.4.6/changelog.html#december-21-2017)

A new from_scratch configuration option makes the importer remove old metadata before applying new metadata. This new feature complements the zero and scrub plugins but is slightly different: beets clears out all the old tags it knows about and only keeps the new data it gets from the remote metadata source.

  1. (From beets issue 934)

--from-scratch: every field in the Item should be zeroed before applying the matched metadata (this zeroing should only happen if the user actually chooses apply.)

  1. (From beets issue 1173)

"from-scratch" "is about tagging the file from a completely blank slate, ie., removing all data from the file before writing new data".

Re the scrub plugin, I am preparing a bug report for that too, but that's a different bug report - I want to keep this bug report focused on the --from-scratch option (and the from_scratch configuration option) only.

Is this observed behaviour of the --from-scratch option "by design"?

jaimet commented 4 years ago

This issue currently displays a "needinfo" label. Does this issue need more details or a follow-up from me, or is this (the fact that this issue is showing a "needinfo" label) a bug?

sampsyo commented 4 years ago

Sorry for the silence. I admit I'm a little overwhelmed by all the data here—is it possible to distill what you're observing with the --from-scratch mode that contradicts your expectations? I think the important thing is to start by making sure you're observing differences in the beets library database first (as observed by beet ls or beet info -L or similar), as opposed to the on-disk tags (as observed by mid3v2). Then, as a separate question, we can investigate the association between the database data and the tags.

stale[bot] commented 3 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jaimet commented 3 years ago

(Thank you, stale bot, for the prod! :thumbsup:)

Adrian, apologies for the delay in getting back to you.

I only use beets to tag my audio files - I don't actually use the beets library database at all.

I would like beets to completely remove all pre-existing tags in my audio files when it tags them i.e. I do not want to retain any pre-existing tags in the audio files i.e. the only ID3 tags that I want in my audio files are those tags that beets adds during processing. I think that according to the documentation, the --from-scratch option should do this i.e. the --from-scratch option should remove all pre-existing tags from my audio files during processing.

However, according to my testing, the --from-scratch option does not do this i.e. the --from-scratch option does not remove all_ pre-existing tags from my audio files (this is what I show above - there's an unwanted tag in my audio file before processing, and using the --from-scratch option does not remove it.)

I think that there is a discrepancy between the documentation and the observed behaviour. I do not know whether the documentation is wrong or the observed behaviour is wrong - I just think that they do not match up.

I realise that you said (above) that I should look at scrub plugin. This makes me think that the --from-scratch option is not supposed to remove all pre-existing tags from my audio files during processing. Is this correct?

I hope that this comment makes sense - if it doesn't make sense, then please let me know and I'll try to explain a different way.

sampsyo commented 3 years ago

I think the main thing to clarify here is that --from-scratch only applies to fields that beets actually supports as columns in its database. That's why it's a little hard to talk about this (and measure the effect) in a database-free setting. The scrub plugin is responsible for removing metadata that beets does not support, so using them together might be close to what you're after.

stale[bot] commented 3 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jaimet commented 3 years ago

Hi stale bot. Unfortunately, I haven't yet had the time to continue working on this issue, but I want to keep it open as I don't yet think that it has been satisfactorily resolved (although I'm starting to wonder whether I consider this to be more a documentation issue rather than a code issue). I'm adding this comment so you don't close this issue tomorrow.

stale[bot] commented 3 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

the-confessor commented 3 years ago

I think the main thing to clarify here is that --from-scratch only applies to fields that beets actually supports as columns in its database.

This is a pretty key detail, I think it would be worthwhile to update documentation to note this. I was wasting my time testing this feature and preparing to report an issue, then stumbled upon the issue already reported here, and finally figured it out.

sampsyo commented 3 years ago

Sounds like a good idea. We could link from there to the scrub plugin, which goes the "rest of the way"?