fmang / opustags

Ogg Opus tags editor
BSD 3-Clause "New" or "Revised" License
75 stars 10 forks source link

Support for newlines in tags #54

Closed hackerb9 closed 1 year ago

hackerb9 commented 2 years ago

Request

Please add support for newlines in tags. While I do not usually add newlines to the tags, it is extremely common to find them already existing in .opus files that have been downloaded, especially from YouTube via youtube-dl -x.

The problem

Currently, when using opustags -e on a file with newlines in the comments, opustags will not give any warning that it cannot preserve the newlines. Instead, when the file is saved, opustags dies badly, incorrectly stating that a tag is malformed.

```log warning: Leaving foobird.opus.BejwAL.opustags on the disk. Stravinsky - The Firebird _ Gergiev · Vienna Philarmonic · Salzburg Festival 2000.opus: error: Malformed tag: Great presentation of the Vienna Philharmonic conducted by the russian Maestro Valery Gergiev, in one of the most powerful and greatest presentation of The Firebird (L'Oiseau de feu) of Igor Stravinsky at Salzburg Festival 2000. ```

Possible Solution

I suggest automatically converting a tag with multiple paragraphs into a multiple instances of the same tag. For example:

title=Stravinsky: The Firebird / Gergiev · Vienna Philarmonic · Salzburg Festival 2000 date=20111106 DESCRIPTION=Gran presentación de la Orquesta Filarmónica de Viena, conducida por el director ruso Valery Gergiev en [a mi juicio personal] una de las más grandes y magníficas interpretaciones del Pájaro de Fuego (L'Oiseau de feu) de Igor Stravinsky, que se tenga conocimiento, durante el Festival de Salzburgo 2000.

Great presentation of the Vienna Philharmonic conducted by the russian Maestro Valery Gergiev, in one of the most powerful and greatest presentation of The Firebird (L'Oiseau de feu) of Igor Stravinsky at Salzburg Festival 2000.

would become

title=Stravinsky: The Firebird / Gergiev · Vienna Philarmonic · Salzburg Festival 2000 date=20111106 DESCRIPTION=Gran presentación de la Orquesta Filarmónica de Viena, conducida por el director ruso Valery Gergiev en [a mi juicio personal] una de las más grandes y magníficas interpretaciones del Pájaro de Fuego (L'Oiseau de feu) de Igor Stravinsky, que se tenga conocimiento, durante el Festival de Salzburgo 2000. DESCRIPTION= DESCRIPTION=Great presentation of the Vienna Philharmonic conducted by the russian Maestro Valery Gergiev, in one of the most powerful and greatest presentation of The Firebird (L'Oiseau de feu) of Igor Stravinsky at Salzburg Festival 2000.

Reuse of a tag like this is explicitly allowed in the Vorbis comment specification, so I believe this is a good solution.

Example Kludge

I have a kludge which kind of works to add repeated tags, but only as long as the contents of a tag does not have an equals sign (=).

opustags foo.opus | awk -F= '
    $2 != "" { tag=$1; }; 
    $2 == "" { printf("%s=", tag) }; 
    {print}
 ' | opustags --set-all foo.opus -o bar.opus

If I'm lucky and there were no equals signs in the data, I can then use opustags -e bar.opus without error. I present this fragile kludge only as an example. Any generally applicable solution will require modifying opustags.

fmang commented 2 years ago

The multilinetag to multiple one-line tags conversion is not correct. Tag reuse is allowed for cases like songs with multiple artists, and is not semantically equivalent to multiline tags. While not as convenient as --edit, you should be able to robustly edit files with multiline tags using --add and --set.

The current format used for input and output is not ready for multiline tags, but maybe it could be extended to support escape sequences, like adding a backslash at the end of a line to indicate continuation, and supporting escaped backslashes too in case the tag does terminate with a literal backslash. Standard escaping stuff.

Alternatively, more formats could be supported, like YAML or JSON, so that we get the benefits of a robust format without reinventing the wheel. Plus, opustags would get even easier to integrate into external programs. However, I believe the current format is more user-friendly with --edit.

Would you be interested in contributing a patch for either option?

hackerb9 commented 2 years ago

Okay, scratch that first idea. I didn't know that tag reuse was not semantically equivalent. Too bad as that would have made editing with a text editor easier than backslashes. (I've edited enough .Xresources files to know that that is not fun. Especially when you've got an invisible space after the backslash!)

I do think the current output format is nice because a text editor makes sense for editing multiple tags at once. Having to ensure I add a backslash at the end of each line can be tedious.

How about this as a proposal for the format: since tag names should not begin with whitespace, declare that any line that begins with a HORIZONTAL TAB character is a continuation of the previous tag. On input, the initial TAB is removed from continuation lines. On output, a TAB is added after an embedded LINEFEED.

Side note Unimportant thoughts that occur to me as I think about implementing this: * For DOS CR-LF support, leaving CARRIAGE RETURN unchanged on input or output should suffice. * As a convenience to people editing an opustags file, on input a completely empty line can be treated the same as a line containing only a single TAB.

Yes, there are problems with the idea: for example, a person may try to add spaces instead of a single TAB and end up with an inscrutable error, or they may use multiple TABs for indentation and be surprised when only the first one is removed. However, it would be a better situation than the current state.

I might be able to make a patch to do such a thing, but I'd want to know if it's something you would even want or if there's some fatal flaw in my logic that I'm overlooking.

fmang commented 2 years ago

Your proposal to use exactly one tab as a prefix for continuation is intuitive, convenient, robust and simple. I like it.

Concerning the edge case, I suggest that:

  1. If opustags detects a line starting with spaces, it fails with an explicit error.

  2. Empty lines are implictly treated like a continuation (your idea is good), unless it’s the last continuation line. Basically, empty lines separating tags should be ignored:

    X=123
           Continuation.
    
    Y=456
  3. If the equal sign is immediately followed by a line feed, then the value should begin at the first continuation line, like this:

    X=
           First line.
           Second line.

    I think this is more readable, and should be preferred when outputting the tags.

These three points are nice to haves, so I suggest you ignore them for your first PR.

Overall, I think this is a good enhancement and would be glad to merge it if you make a patch.

hackerb9 commented 2 years ago

Okay, I have a basic implementation working which roundtrips correctly for my files. I appear to have an error in my memory allocation logic, though. (Core dumps when I test read_comments with varying lengths of data). It probably has a simple fix, but it's past midnight and I don't see it right now. Do you?

I used realloc() to fit with the existing code that uses C's getline() and free(), but maybe getline should be replaced with whatever the C++ equivalent is so we don't have to worry about these sort of memory errors.

fmang commented 2 years ago

Let’s focus on the printing first. I notice that you use insert() to add the tab characters, which is costly. You’d better reuse the first scan and compute newline_count instead of has_newlines, then allocate an std::string of size comment.size() + newline_count, and fill it character by character using operator [].

hackerb9 commented 2 years ago

That sounds like a reasonable optimization to make.

hackerb9 commented 2 years ago

I made the optimization.

The worst case scenario for the std::string insert method is a comment with 64K of newlines. Runtime for that improved by 15x, but the absolute gain was small (went from 47 ms to 3 ms). Was it worth it given that copying by hand is a little bit uglier and worst case scenarios are rare? Maybe not yet. I believe it will be more obviously a good thing once comments larger than 64K are allowed.

fmang commented 2 years ago

Note that the performance multiplier you measured is not constant. The complexity went from O(n^2) to O(n). I don’t think opustags will suffer from performance issues anytime soon, but writing linear-time implementations is easy enough to be worth the effort I think.

Could you please squash your commits and make a pull request for the printing part only? It’s gonna be easier for me to review.

fmang commented 1 year ago

@hackerb9 I plan to make a new release soon and consider solving this issue too. Are you interested in making the pull request, or would you rather I finalize this myself?

fmang commented 1 year ago

I’ve started looking further into it in order to finalize the feature. I’ll probably keep your implementation proposal for history but began reworking it.

fmang commented 1 year ago

Ready for the next release.