Several Tag-Fields are int()

orgua commented 4 years ago

First, let me thank you for your lib - it saved me a lot of time!

Second i wanted to ask you why some fields like "tracknumber" and "year" are int-only? I only checked one use-case for Opus / Vorbis Comment and all fields are stored as UTF-8 (https://en.wikipedia.org/wiki/Vorbis_comment).

Yes, converting to int() has some free cleaning-advantages but also makes ie. leading zeros in tracknumbers impossible. I think it would be more helpful to give the control back to the user - he is already brave enough to overwrite the tags :)

What do you think?

orgua commented 4 years ago

Small addition: i forked the project and tested the changes - seems to work fine with the underlying libs.

KristoforMaynard commented 4 years ago

So, the goal of this library is to present a common interface for all formats, and some formats don't support strings as tracknumbers. Also, I find leading zeros aesthetically awkward. If users know what they're doing, they can always change the underlying mutagen object directly.

orgua commented 4 years ago

thanks for the feedback!

So what is the general use-case of this lib? I'm trying to fix all my "broken" tags:

consolidate names of artist, album, year, comments
clean all tags by removing multiple-spaces, broken unicode and other unprintable characters

I love the unified access to all audio-formats but dislike the heavy sanitation, some by the int()-conversion, some by sanitizers and some by the get/set-easy-(). The changes made are not transparent, so when i read a tag-field my script has to assume that it is the raw-value. when there is nothing wrong with it, it won't save the tag back to file. that would be bad practice and possibly dangerous for the audio-file. By writing this i realise that a file["tag"].raw.value-fn would be the best solution.

KristoforMaynard commented 4 years ago

Ah, ok. So, this lib is really just part of a music player that I was writing, and I needed a way to suck in metadata with one interface regardless of source. As a result, I let it become very opinionated about the kind of data it would return (i.e., track number is a number, year is a year, not a date, etc.). The ability to change the tags and save back to disk came when I noticed that it wouldn't be much additional effort.

That said, I do see the utility of this lib as a tag cleaner, and in that context I agree; this lib is being a little too draconian. And like you say, it's certainly not transparent.

I really like your file["tag"].raw. syntax. Unfortunately it can't be used like file["tag"].raw = val. Maybe file.raw["tag"] to make it similar to Pandas' DataFrame.iloc. Then it can be used for getting and setting any type the underlying file format can handle.

In a related note, I discovered that something like f.mfile['trkn'][0] = ('01', '15') on an m4a file will succeed, but then f.save() will fail as mutagen assumes things are integers. This isn't great.

orgua commented 4 years ago

cool to hear and i like the pandas approach! And you are right, i never liked m4a as a format, even without these limitations. but lets keep it real - someone that is touching raw values (or saving back normal audiotags in general) is either brave or knows what he does. Even with your savekeeping i have to handle 5 different Exceptions on a wild bunch of audiofiles from within mutagen, music-tag and python itself. without the checks there is only one more so far. so i think this change will be a great addition. Have a nice weekend :)

KristoforMaynard commented 4 years ago

I'm curious if you could share some dummy files with your more pathological tags. Most of my music is ALAC, so things just generally work for me.

orgua commented 4 years ago

no leading zero by container-design :D i'll have a look this weekend - didn't github have a private messaging option? i'm to shy to post my email or signal handle here in public - but i will figure something out

KristoforMaynard commented 4 years ago

I added a layer to start playing with this raw functionality, but I'm still not sure how it should behave. It should be somewhere between the strict default behavior, and the bare-bones file.mfile... access, but where to draw the line will require some more thought.

KristoforMaynard commented 4 years ago

Raw functionality should now work (v 0.3.1). We can open a new issue if there are still some bugs, or if we want to change the behavior.

orgua commented 4 years ago

Thanks for the update! I'm still running on my own branch, but will try to switch in the next days. a little bit off-topic: manipulating tags works fine! you mentioned somewhere, that it wasn't tested well before. I tried changing / deleting every field for several thousand files, mainly mp3 and flac. Just two issues that should be mutagen-related:

if some old mp3 only have id3v1, it throws an exception, also if tag / field is not present (could be also other issues)
under some circumstances deleting "artwork" can erase unwanted parts of the audio-file. i think it is mp3-related -> script searches through library, deletes all present artwork, and even finds something on a second run. after that the audio-duration is shorter or the stream is broken.

KristoforMaynard commented 4 years ago

Well that 2nd point is concerning. I wonder if it's worth putting some checks in place so that the audio data is never accidentally changed.

KristoforMaynard / music-tag

Several Tag-Fields are int() #1