darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.74k stars 1.14k forks source link

Tag export: "omit hierarchy" behavior can't actually be disabled #3206

Closed junkyardsparkle closed 5 years ago

junkyardsparkle commented 5 years ago

EDIT: the actual issue has been summarized below, since I was initially confused about the new intended behavior.

Describe the bug

When exporting an image which has hierarchical tags applied, and the "omit hierarchy" checkbox is NOT selected in the export settings, all tags within the branch are still not included as individual tags.

To Reproduce

Start darktable with new profile, import an image, add a tag such as 'foo|blah|baz|bar', export image using default JPEG settings. Observe XMP Subject field in exported file contains only 'bar'. No combination of options selected in export settings would change this for me.

Expected behavior

Include all elements of hierarchical tags when "omit hierarchy" is not selected.

Platform (please complete the following information):

Linux, darktable 2.7.0+2075, exiv2 0.26 (XMP Core 4.4.0-Exiv2)

Additional context

In fact, I was having more complicated, harder-to-reproduce issues under real-world conditions when trying to export images with multiple hierarchical tags. In some cases, only the first and last element of the "path" were included. In a few cases, all were. None of these tags has been set as "private" or "category" or any other special class, as far as I can tell (and selecting those boxes didn't help). This is my attempt to provide a simple test that can be reproduced, in the hopes that it will be enough to start an investigation. ;-)

If anyone can NOT reproduce this, could you tell what exiv2 version you build against? Thanks!

junkyardsparkle commented 5 years ago

I finally updated my system to exiv2 0.27.2, and it didn't fix the issue.

Here's a more methodical procedure I just followed on a clean, untouched profile: Import image (with existing XML sidecar having tags 'something' and 'foo|blah|baz|bar'); tags are displayed correctly in module view. Without changing default settings in export module, export image. The JPEG contains neither 'Subject' nor 'Hierarchical Subject' XML tags, even though 'tags' checkbox is selected by default.

Select the 'Hierarchical tags' checkbox and export again. JPEG now has both tags, but 'Subject' tag contains only 'bar', 'exported' and 'something' keywords, even though 'omit hierarchical' is unselected by default. The 'Hierachical Subject' tag contains all full keywords as expected.

Select the 'omit hierarchy' checkbox, export, same results as above.

Select the 'synonyms' checkbox, export, same results as above.

Select the 'private tags' checkbox, export, same results as above.

junkyardsparkle commented 5 years ago

@phweyland Any thoughts on this? It seems like a potentially nasty bug that can bite people without them realizing it until later (such as after uploading a bunch of files somewhere, which happened to me).

phweyland commented 5 years ago

Just seen your issue. I'll look at this. With dt 2.7.0+2055~g36f87d176 that still works for me (and Exiv2 0.27.2-2).

Which "target storage" are you using ? The "file on disk" one ?

Could you share an image I could import for testing ?

phweyland commented 5 years ago

but 'Subject' tag contains only 'bar', 'exported' and 'something' keywords

Should not be exported (neither in Subject nor in Hierarchical Subject), never.

phweyland commented 5 years ago

May be a silly question : Is your database set with UTF-8 ?

junkyardsparkle commented 5 years ago

With dt 2.7.0+2055~g36f87d176 that still works for me (and Exiv2 0.27.2-2).

Interesting. So, to be clear, with the test procedure described above, you end up with "foo", "blah", "baz", and "bar" all in the (non-hierarchical) Subject tag? If so, then something is weird on my end. Maybe don't worry about this until I test more or somebody else can reproduce.

Which "target storage" are you using ? The "file on disk" one ?

Yes, exactly the defaults with a new, unmodified profile as created on first run.

Could you share an image I could import for testing ?

Sure, but I can't do it right now (remind me if I forget later).

junkyardsparkle commented 5 years ago

May be a silly question : Is your database set with UTF-8 ?

That doesn't sound like a silly question at all... sounds like a good guess. How do I check that?

phweyland commented 5 years ago

with the test procedure described above, you end up with "foo", "blah", "baz", and "bar" all in the (non-hierarchical) Subject tag?

As a matter of fact ... no. With your procedure I get this Subject : bar, exported While I have not the issue with my current database...

I've matter for work now. :)

junkyardsparkle commented 5 years ago

Well, glad to know I'm not in the Twilight Zone here... thanks for doing the extra checking. :-)

phweyland commented 5 years ago

About "exported" that's a bug. If you launch again dt it will disappear. However I'll fix that.

The rest is normal for the implemented logic (but I understand this is a change compared to the past). The logic I've followed is that if an intermediate tag, let's say 'foo|blah', doesn't exist as a tag it is considered as a 'category'. That's why you haven't 'blah' in Subject. If you create the 'foo|blah' as a tag and don't set it as category it appears in Subject list, even if it is not attached.

So to be transparent with previous behavior I could invert that logic, i.e. if 'foo|blah' doesn't exist as a tag it is NOT a 'category'. It is not intuitive (for me) but that can avoid to break the former logic (where category did not exist).

What do you think ?

PS: a category is never exported into Subject.

phweyland commented 5 years ago

I could invert that logic

It seems more complicated to implement than the current one. :(

phweyland commented 5 years ago

The current representation of tags (in italic) in hierarchy is aligned with the current logic (path per default as category). Should be removed too if we invert the logic.

junkyardsparkle commented 5 years ago

So to be transparent with previous behavior I could invert that logic, i.e. if 'foo|blah' doesn't exist as a tag it is NOT a 'category'. It is not intuitive (for me) but that can avoid to break the former logic (where category did not exist).

Ah, so the issue was that I had implicit "categories" that I didn't know were categories... I'll give the module some more use with this in mind, and see if it still feels like a "surprising" behavior or not.

It might be enough to make this very, very clear in the release notes... I'll also try to do some editing work on the tagging doc soon, since I seem to have the right "unfamiliar user" perspective... ;-)

junkyardsparkle commented 5 years ago

Out of curiosity, is the current "implicit category" logic consistent with any other software that you're familiar with? That would be an argument for retaining it... I'm just not really in touch with commercial software of the last decade or so. I'll close this, since the issue isn't what I thought it was. Might be good to get more user input about this in a forum context, I suppose? It's really hard to guess if anyone else is likely to be affected the same way as I was... anyway, thanks again for the work!

phweyland commented 5 years ago

Out of curiosity, is the current "implicit category" logic consistent with any other software

Actually I have no such a reference. I've just followed a certain logic. For example in your 'foo|blah|baz|bar' case you cannot assign 'blah' alone. This can mean this is not a real tag. And as it cannot can get specific attributes it was necessary to make to some choice. That's why I've worked in that way.

But I have to admit I don't see the reverse logic weaker than the current one. It's just a different assumption. If it is a bit more complex to implement I don't think there is anything impossible there.

@TurboGit, what are your thoughts ?

junkyardsparkle commented 5 years ago

My own assumptions were probably based mostly on previous behavior of darktable with same tags library (inertia), but also somewhat by geeqie, which is the only other software I've used which maintains a library of keywords presented in a tree view. In that case, when adding another keyword to the tree the user explicitly selects between "Active keyword" (default) and "Helper" (ie category). So... I just wasn't expecting anything to have implicit category status... for whatever that's worth - I'm not asserting that this will be the "normal" assumption. :-)

That first part about previous behavior is what concerns me... but if things need to get broken a little bit, this is probably the release to do it in!

phweyland commented 5 years ago

when adding another keyword to the tree the user explicitly selects between "Active keyword" (default) and "Helper" (ie category)

Here dt behaves the same way. Category must be set explicitly. On the other hand 'blah' in 'foo|blah|baz|bar' is not a tag, just a piece in the path of 'bar'. Except if 'foo|blah' is also created, by default as a tag (not a category).

junkyardsparkle commented 5 years ago

Yes, so to summarize this issue for anyone else concerned:

I had become accustomed to thinking of all elements in hierarchical "paths" as keywords themselves, based on the previous darktable behavior of always treating them that way. The new tagging module logic does not treat them this way - they must each also exist as end nodes within the library to be counted as keywords and exported. Therefore, for a library containing hierarchies such as

people|family|smith|bob people|friends|jones|susan

but for whatever reason not also individually containing:

people|family|smith people|family people

then those elements will not be considered keywords along with "bob" when "bob" is attached in the hierarchical form above.

Is this a problem? Well, maybe, in the sense that a workflow that has worked for users previously can suddenly stop working as expected in a subtle way that is easy to not notice until later (for instance, after many images have been exported and uploaded somewhere). The nature of the problem may also not be immediately apparent (if you're as dumb as me).

So, the question is to alter the logic, or make its workings very clear in the manual and release notes, and hope people read them. Right? :-)

junkyardsparkle commented 5 years ago

It also now seems to me that the concept of the explicitly set "category" is only really needed with the old logic; as it is now, just exluding a term from the library as an end node would accomplish the same thing... although I may be missing some use case?

junkyardsparkle commented 5 years ago

Looking at this some more, I think the presence of the "omit hierarchies" checkbox in export settings is part of the source of confusion, since it implies that selecting it would enable the current logic vs. the old logic... when in fact the situation isn't quite that simple. With that and the addition of categories, I would say it's very counter-intuitive for a previous user to assume the new logic - these seem like features you would add to complement the old logic.

phweyland commented 5 years ago

I think the presence of the "omit hierarchies" checkbox in export settings is part of the source of confusion

Agreed. It has an effect (when both 'foo|blah|baz|bar' and 'foo|blah' exist the second one ignored if "omit hierarchy" is set) but not very useful and may be counter-intuitive.

It also now seems to me that the concept of the explicitly set "category" is only really needed with the old logic

(you mean with the current logic, right ?) Partly agreed. First, for the "helper" side I think it doesn't hurt to have both possibilities: implicit and explicit category.

Then, the most important, when a leave tag is a category itself, it can be used to create whatever kind of meta information you think of and use them to set up xmp tags of your choice at export time. example: creator|John Smith, both tags set as category. Neither creator nor John Smith are exported (because categories) into Subject but John Smith can be used in export formulas like Xmp.dc.rights = Copyrights $(YEAR) $(CATEGORY0(creator)).

image

It also appear separately in image information:

image

to produce this:

---- XMP-dc ----
Rights                          : Copyrights 2019 John Smith
Subject                         : bar

This said, the debate between implicit keyword or implicit category for non declared path elements, for example 'blah' in 'foo|blah|baz|bar', is still valid.

junkyardsparkle commented 5 years ago

(you mean with the current logic, right ?) Partly agreed. First, for the "helper" side I think it doesn't hurt to have both possibilities: implicit and explicit category.

I only meant that the old logic really needed the feature, because it had zero methods for excluding anything from export, while the other logic has at least one method "built in". Your example case is one (of many?) that I hadn't considered, though... I assumed there were some. Anyway, I woke up to find things already restored to "normal", so thanks for that (I was really trying not to insist too much that this was the "correct" solution, but if you're not unhappy with it then it seems good to me). I'll try to torture-test the new code later. :-)

phweyland commented 5 years ago

I'll try to torture-test the new code later. :-)

Yes, please !