Closed hackerb9 closed 1 year ago
The multilinetag to multiple one-line tags conversion is not correct. Tag reuse is allowed for cases like songs with multiple artists, and is not semantically equivalent to multiline tags. While not as convenient as --edit, you should be able to robustly edit files with multiline tags using --add and --set.
The current format used for input and output is not ready for multiline tags, but maybe it could be extended to support escape sequences, like adding a backslash at the end of a line to indicate continuation, and supporting escaped backslashes too in case the tag does terminate with a literal backslash. Standard escaping stuff.
Alternatively, more formats could be supported, like YAML or JSON, so that we get the benefits of a robust format without reinventing the wheel. Plus, opustags would get even easier to integrate into external programs. However, I believe the current format is more user-friendly with --edit.
Would you be interested in contributing a patch for either option?
Okay, scratch that first idea. I didn't know that tag reuse was not semantically equivalent. Too bad as that would have made editing with a text editor easier than backslashes. (I've edited enough .Xresources files to know that that is not fun. Especially when you've got an invisible space after the backslash!)
I do think the current output format is nice because a text editor makes sense for editing multiple tags at once. Having to ensure I add a backslash at the end of each line can be tedious.
How about this as a proposal for the format: since tag names should not begin with whitespace, declare that any line that begins with a HORIZONTAL TAB character is a continuation of the previous tag. On input, the initial TAB is removed from continuation lines. On output, a TAB is added after an embedded LINEFEED.
Yes, there are problems with the idea: for example, a person may try to add spaces instead of a single TAB and end up with an inscrutable error, or they may use multiple TABs for indentation and be surprised when only the first one is removed. However, it would be a better situation than the current state.
I might be able to make a patch to do such a thing, but I'd want to know if it's something you would even want or if there's some fatal flaw in my logic that I'm overlooking.
Your proposal to use exactly one tab as a prefix for continuation is intuitive, convenient, robust and simple. I like it.
Concerning the edge case, I suggest that:
If opustags detects a line starting with spaces, it fails with an explicit error.
Empty lines are implictly treated like a continuation (your idea is good), unless it’s the last continuation line. Basically, empty lines separating tags should be ignored:
X=123
Continuation.
Y=456
If the equal sign is immediately followed by a line feed, then the value should begin at the first continuation line, like this:
X=
First line.
Second line.
I think this is more readable, and should be preferred when outputting the tags.
These three points are nice to haves, so I suggest you ignore them for your first PR.
Overall, I think this is a good enhancement and would be glad to merge it if you make a patch.
Okay, I have a basic implementation working which roundtrips correctly for my files. I appear to have an error in my memory allocation logic, though. (Core dumps when I test read_comments with varying lengths of data). It probably has a simple fix, but it's past midnight and I don't see it right now. Do you?
I used realloc() to fit with the existing code that uses C's getline()
and free()
, but maybe getline
should be replaced with whatever the C++ equivalent is so we don't have to worry about these sort of memory errors.
Let’s focus on the printing first. I notice that you use insert() to add the tab characters, which is costly. You’d better reuse the first scan and compute newline_count instead of has_newlines, then allocate an std::string of size comment.size() + newline_count, and fill it character by character using operator [].
That sounds like a reasonable optimization to make.
I made the optimization.
The worst case scenario for the std::string insert method is a comment with 64K of newlines. Runtime for that improved by 15x, but the absolute gain was small (went from 47 ms to 3 ms). Was it worth it given that copying by hand is a little bit uglier and worst case scenarios are rare? Maybe not yet. I believe it will be more obviously a good thing once comments larger than 64K are allowed.
Note that the performance multiplier you measured is not constant. The complexity went from O(n^2) to O(n). I don’t think opustags will suffer from performance issues anytime soon, but writing linear-time implementations is easy enough to be worth the effort I think.
Could you please squash your commits and make a pull request for the printing part only? It’s gonna be easier for me to review.
@hackerb9 I plan to make a new release soon and consider solving this issue too. Are you interested in making the pull request, or would you rather I finalize this myself?
I’ve started looking further into it in order to finalize the feature. I’ll probably keep your implementation proposal for history but began reworking it.
Ready for the next release.
Request
Please add support for newlines in tags. While I do not usually add newlines to the tags, it is extremely common to find them already existing in .opus files that have been downloaded, especially from YouTube via
youtube-dl -x
.The problem
Currently, when using
opustags -e
on a file with newlines in the comments, opustags will not give any warning that it cannot preserve the newlines. Instead, when the file is saved, opustags dies badly, incorrectly stating that a tag is malformed.Possible Solution
I suggest automatically converting a tag with multiple paragraphs into a multiple instances of the same tag. For example:
would become
Reuse of a tag like this is explicitly allowed in the Vorbis comment specification, so I believe this is a good solution.
Example Kludge
I have a kludge which kind of works to add repeated tags, but only as long as the contents of a tag does not have an equals sign (
=
).If I'm lucky and there were no equals signs in the data, I can then use
opustags -e bar.opus
without error. I present this fragile kludge only as an example. Any generally applicable solution will require modifying opustags.