Closed sauliusg closed 3 years ago
We disallowed to have two groups with the same name (even in different hierarchical levels / subtrees), because they would contain exactly the same entries (as you have remarked). This lead to many confusions for users, so we prevented it. See https://github.com/JabRef/jabref/issues/1495 for details.
I'm not sure if we really need a warning message. The only way to get duplicate names is by manually editing the bib file (or upgrading from an older version). I would say that's pretty rare.
We disallowed to have two groups with the same name (even in different hierarchical levels / subtrees), because they would contain exactly the same entries (as you have remarked). This lead to many confusions for users, so we prevented it. See #1495 for details.
I think the bibliography tool should allow to do what users want, and not to "disallow" something. JabRef supports a lot of very different bibliography management styles, which is its big advantage, but the block in the GUI that inhibits creation of groups with the same name, even in different parts of the hierarchy, is a serious usability issue, and in this particular respect JabRef clearly looses to Zotero :( (even thougt the general policy of JabRef is so much superior of what other bibliography managers offer!)
I'm not sure if we really need a warning message. The only way to get duplicate names is by manually editing the bib file (or upgrading from an older version). I would say that's pretty rare.
If a group with the same name in different places of the hierarchy (or even in the same group) is being created, it would be nice if the GUI would issue a warning here, not a blocking error. The warning would explain that the group with the same name already exists (and mention its places in the hierarchy), but let the user do what he/she wants. Explicit and complete information wold remove most of confusion, IMHO. The GUI could even suggest a unique group name based on its current place in the group hierarchy.
The only way to get duplicate names is by manually editing the bib file (or upgrading from an older version). I would say that's pretty rare.
In my case, it is not rare but ubiquitous, and judging from other responses in #1495 and other related posts many users miss the group hierarchy with arbitrary group names. I tried to use a workaround in #1495, like "Diabetes > Treatment", and with the current implementation it is very, very inconvenient.
So lets sum up the current situation:
A possible (easy?) way to reconcile these two requirements could be:
Example:
Groups:
Bioinformatics
- Drug design
-- Diabetes
--- Treatment
Medicine
- Diabetes mellitus
-- Treatment
- Diabetes
-- Treatment
- Asthma
When creating a subgroup called Treatment
in Asthma
, the GUI could respond:
Warning:
A group named 'Treatment' already exists in 'Medicine/Diabetes/Treatment', 'Medicine/Diabetes mellitus/Treatment'.
If you create the group 'Treatment' here, it will contain entries from all mentioned groups.
[x] Use a unique name 'Treatment (Asthma)'
[OK] <- the button is *always* active!
I deliberately suggest using 'Treatment (Asthma)' instead of 'Asthma/Treatment' to avoid impression that the group name is strictly bound to group's position in the hierarchy. It is not; so moving group 'Treatment (Asthma)', say, one level up to 'Medicine' would not make its name "incorrect", in the sense that the group still contains papers about asthma treatment no matter where in the hierarchy it resides. This also means that JabRef does not need to bother about renaming groups when their position in the hierarchy changes (as is implemented now).
The suggested change would:
PS. If the developers agree on such (or similar) design I would take a the issue and offer a PR :)
The only way to get duplicate names is by manually editing the bib file (or upgrading from an older version). I would say that's pretty rare.
Another use case for groups with the same name – have a look at the screenshot example with computer/human languages: The groups 'Language' in different branches of hierarchies are union groups, and contain (under the current implementation!) different sets of papers, since they contain different subgroups (Programming/Languages – 17 papers; Data description/Languages – 13 papers). As expected. These groups have no papers assigned to them directly; this is not needed since the purpose of these groups is to summarise all papers assigned to their subgroups, which they do nicely.
In this setting there is no need to name the 'Language' groups differently, that would be very awkward. Also, the group's semantics will not change if I move them to different location in the tree (with all their subgroups), for example into 'Computers/Software' as opposed to 'Computers/Hardware' at some stage.
A logic (classification) case for non-unique group names:
I'm not sure if we really need a warning message.
If you mean a warning when opening file with duplicated group names, then I agree, there is not need for the warning there. Which just means that it is a legit situation :)
I agree with you that it sometimes could be advantages to have the same group in two different places. However,
subgroup can also belong to several different super-groups (together with all papers assigned to it)
is a highly unusual concept from a user-interface perspective. For example, files with the same name but in different folders don't automatically sync their content. I think we should really stick to the tree nature of the group interface, meaning that groups are independent.
As a way forward, I would propose the following:
What do you think? Help on this of course very much appreciated.
I agree with you that it sometimes could be advantages to have the same group in two different places.
ACK.
However,
subgroup can also belong to several different super-groups (together with all papers assigned to it)
is a highly unusual concept from a user-interface perspective.
I was thinking about this. It could be that we see the concept as unusual only because we are so much used to the file system tree structure. But file system tree is a bad tool for bibliography management (tried it, switched to JabRef :) ), precisely because we can not place the same file in several different directories (or rather we can, but this is not very convenient).
Groups in bibliography managers have the advantage that we can place the same file (or the same subgroup) in different hierarchies. In this case groups serve as a sort of "canned queries" on the database.
When one gets used to the current implementation of JabRef groups, one starts thinking that the group tree and the group assignments are actually disjoint features. A tree is described in @Comment{jabref-meta: grouping:
element, no matter how the entries are assigned to grops; the group assignments is written in entries, no matter how the groups are moved around in the tree. Orthogonality of these two features is nice. Once you realise this, the uniqueness of group name no longer seems important or necessary, and synonymous groups can have many uses.
For example, files with the same name but in different folders don't automatically sync their content.
Well, this is not always true. On Unix (and Unix-like systems like Linux and probably on Mac), a hard-linked file will be "syncronised" across two directories. Viz.:
saulius@tasmanijos-velnias hl-demo/ $ mkdir folder1
saulius@tasmanijos-velnias hl-demo/ $ mkdir folder2
saulius@tasmanijos-velnias hl-demo/ $ echo "Initial content." > folder1/file.txt
saulius@tasmanijos-velnias hl-demo/ $ ln -v folder1/file.txt folder2
'folder2/file.txt' => 'folder1/file.txt'
saulius@tasmanijos-velnias hl-demo/ $ head folder?/*
==> folder1/file.txt <==
Initial content.
==> folder2/file.txt <==
Initial content.
saulius@tasmanijos-velnias hl-demo/ $ echo "Replacing content in folder1..." > folder1/file.txt
saulius@tasmanijos-velnias hl-demo/ $ head folder?/*
==> folder1/file.txt <==
Replacing content in folder1...
==> folder2/file.txt <==
Replacing content in folder1...
So the concept of writing things into one place and finding the change in another place is not that foreign, even if sometimes different to trace.
The concept of "same group in different tree branches" is quite similar to hard links in Unix (not soft links).
I think we should really stick to the tree nature of the group interface, meaning that groups are independent.
Here more input from users would be very welcome. I and you can see the situation one way, but it is hard to see how other users who did not see the code internals (or even the BibTeX database internals) will perceive the situation. I have started a discussion on https://discourse.jabref.org/t/hierarchical-groups-with-duplicated-names-are-actually-working-in-jabref-5-x/2619, as was advised in JabRef instructions, but so far it did not attract much comments or voting (well, it's weekend...). Maybe some feedback can be collected there?
I am, personally, currently very enthusiastic about just allowing duplicate group names, with a warning, and start using them. I'm right now playing with the JabRef tip, and all functions with duplicate group names seem to work as expected.
I must admit that I experienced several mysterious "index out of bound" exceptions, but they are very difficult to reproduce, and they seem to happen also with unique group names, so I do not know if this is connected with the groups. But in all in all having same-name groups does not seem to cause any bug trouble and works as expected.
As a way forward, I would propose the following:
Allow to have groups with the same name in different positions. I.e. properly fix #1495. Warning: this is might be very complicated.
I would be very happy with such solution. However, as you say, the fix might be very complicated, so it makes sense to see what can be done with simple means, even if functionality is not ideal.
An astonishingly simple fix is in my PR https://github.com/JabRef/jabref/pull/7558 .
Of course, we (I) can try to implement the true group hierarchy, naming groups 'Computers/Programming/Languages' and 'Human/Languages', as suggested as a workaround in #1495, and showing just the leaf names in the group tree. I do not think that renaming a group or moving it in the hierarchy will cause big performance issues; groups tend to be small on average, I trust they are in dictionaries (so finding a group is constant time op), and renaming the group has exactly the same performance overhead even with the current implementation (since all entries belonging to the group must be updated when the group is renamed). So this sounds doable.
But, as said, maybe a "lazy" solution proposed in the above PR is just good enough? If so we can have it as a semi-permanent solution and move on. At my site, many other things preclude migration to JabRef 5.4 (from 5.2), for example timestamp issue (https://github.com/JabRef/jabref/issues/7527), file parsing issue (https://github.com/JabRef/jabref/issues/7553).
* Once this is done, we can think about adding a "symlink" group that acts as a copy of a different group.
Currently entry assignments to groups works as hard links, and this seems to be enough. If the full power of tree hierarchy is implemented (i.e. we can assign whatever names we want to groups in different tree branches), then symlinks no longer seem necessary. Probably they will be an overkill, IMHO.
What do you think? Help on this of course very much appreciated.
I'll be more than happy to give any help I can, in implementation or otherwise; just my time resources are not always abundant (but I understand that everyone's time is precious :), so I'll do my best).
As a way forward, I would propose the following:
Allow to have groups with the same name in different positions. I.e. properly fix #1495. Warning: this is might be very complicated.
One more thought about a possible fix:
@Comment{jabref-meta: grouping:
. This field shall be unique within the BibTeX file/database, and shall be assigned by JabRef;E.g.:
-----------------vvv----------- The group ID
1 StaticGroup:sg\;1\;2\;1\;0x8a8a8aff\;\;\;;
2 StaticGroup:Bibliometrics\;2\;0\;1\;\;\;\;;
2 StaticGroup:Biochemistry\;3\;0\;1\;\;\;\;;
3 StaticGroup:Carbonic anhydrases\;4\;0\;1\;\;\;\;;
... and so on
group = {...}
list shall contain group names with IDs appended;
E.g.:group = {Carbonic anhydrases(4), Biochemistry(3)},
In this way, we will have a unique group name(id)
string identifying each group for each entry, and the current group implementation remains as it is. From the user's perspective, however, groups can be given any names, even overlapping with other names in the hierarchy. Each entry will be assigned to precisely one unique group, and no ambiguity (or multiple assignments) will emerge.
Since the ID is always added as a fixed format numeric suffix, no ambiguity ever arises. For instance, if a user names her group "slaughterhouse(5)", and JabRef assigns ID 1024 to this group, then the JabRef entries will identify this group as slaughterhouse(5)(1024)
, and only the last braced number will be interpreted as an ID.
When groups are moved around the tree, or even renamed, no changes need to be done to the BibTeX entries, since groups IDs do not change. Group names and/or IDs no longer need to reflect group's position in the group hierarchy.
There will be hardly more than 10000 groups in a typical database, so the IDs will be comfortable 1 to 4 digit numbers, manageable also by humans.
What is your opinion about such design?
The id solution would be the best from an implementation point of view (since then we indeed have an unique identifier). However, the idea was that users can also edit the groups directly by changing the groups
field. That's no longer possible (without looking up the id of the group...).
The only way I see for this is to use the complete path from the root to the group and save this in the groups field. But then you have to worry about migration...
The id solution would be the best from an implementation point of view (since then we indeed have an unique identifier). However, the idea was that users can also edit the groups directly by changing the
groups
field. That's no longer possible (without looking up the id of the group...).
Thought about that. Editing a BibTeX file is in fact possible, at least as good as it is now:
In short, when editing, the ID number will be a natural extension of the group's name, and can handled together with the name. In JabRef internally ID will be the sole unique index identifying the group.
There is a topic to think – whether just the ID needs to be unique or the combination 'name(ID)', and if just ID is unique then what should JabRef do when it encounters the same ID on the other group... But this is solvable, I believe.
The only way I see for this is to use the complete path from the root to the group and save this in the groups field. But then you have to worry about migration...
Indeed, if you have the full path in the group name, then you need to update it each time the group is moved in the tree. You also need to cope with errors when incorrect path is edited in manually. I think this is a complex solution, and complexity arises from severe denormalisation (in database sense).
In case you have just an ID, you only need to update entries when you rename groups – which you also have to do in the current implementation, and I believe it is working (need to check ;). Moving groups in a tree around does not affect the entries. I would thus favour the 'name(ID)' solution.
But finding the name is easy: I just look at the left side bar for the group, and put exactly what I see in the groups
field and it works. That's harder for the id-based solution (you could add a tooltip but yeah...).
Yes, the "complete path" solution also has disadvantages...
But finding the name is easy: I just look at the left side bar for the group, and put exactly what I see in the
groups
field and it works. That's harder for the id-based solution (you could add a tooltip but yeah...).
True, I somehow overlooked this. But then you can not simply copy that text with a mouse :) (or can you?).
But I agree that having no IDs is simpler. This brings us again to a simple implementation with the current group code that just adds possibility to create identically named groups. I have hacked also the delete group possibility, without removing entries from other groups that have the same name (see the updated PR #7558), and it works OK for "Delete group and subgroups" command. For just "Delete subgroups" command I have troubles (entries are left with the deleted group name), but this behaviour may be related to #7556.
Just as a comment from my side: As mentioned by @tobiasdiez you have to be very careful if you allow users to have duplicate groups. Problems I remember from the past (I am afraid, cannot find the respective bug reports anymore, except for https://github.com/JabRef/jabref/issues/1495):
1) In JabRef 3.8.2 all hell broke loose if you had two groups with the same name (inadvertently created!) at the same hierarchy level: when scrolling through the groups tree, you would reach these groups at some point - when that happened, the whole group tree disappeared/was no longer visible and you had to force-quit JabRef to make it usable again (and then manually edit the respective groups names in a text editor, to avoid this from happening again).
2) If you had duplicated group names, removing items from groups could have side-effects that were not obvious to the user. The user might have forgotten that they had these duplicate group names (especially at the time, when no (stable) group name check was implemented in JabRef). Removing the item from one group would also remove it from the group with the duplicated name - even though that was not the intention of the user.
Now, I am not saying, that this will happen with the current JabRef version as well. But I just want to raise awareness, that these decisions can have major implications downstream.
Dear @AEgit, thank you very much for the comment.
Just as a comment from my side: As mentioned by @tobiasdiez you have to be very careful if you allow users to have duplicate groups. Problems I remember from the past (I am afraid, cannot find the respective bug reports anymore, except for #1495):
In JabRef 3.8.2 all hell broke loose if you had two groups with the same name (inadvertently created!) at the same hierarchy level: when scrolling through the groups tree, you would reach these groups at some point - when that happened, the whole group tree disappeared/was no longer visible and you had to force-quit JabRef to make it usable again (and then manually edit the respective groups names in a text editor, to avoid this from happening again).
I have checked with my group tree, with all disambiguation prefixes removed: the problem is no longer there in the current master tip and in the suggested PR #7558. The groups, also with duplicated names, can be scrolled and used without problems.
If you had duplicated group names, removing items from groups could have side-effects that were not obvious to the user. The user might have forgotten that they had these duplicate group names (especially at the time, when no (stable) group name check was implemented in JabRef).
Sure, the user will not keep in mind all possible configurations and duplications of all groups...
Removing the item from one group would also remove it from the group with the duplicated name - even though that was not the intention of the user.
I agree. Therefore I suggest in the PR #7558 make a very simple check – the group is only removed from the group list in the entries if there are no more groups with that name left in the group tree.
In this way, as long as the deleted group exists somewhere in the tree (possibly in multiple places), the entries stay assigned to these groups.
I have just noticed that when I rename one of the dupliactes, its entries are reassigned to the new group and vanish from all old group duplicates. This is not what I (and probably most other users) would expect; the fix should be easy – if the group(s) with the original name exist somewhere in the tree, the old group name should be retained in the entries. The entries will thus belong to two groups if one of the duplicates is renamed. Only of the group name vanishes from the tree, it should be removed from the entries as well.
I'll amend the PR to fix this.
Now, I am not saying, that this will happen with the current JabRef version as well. But I just want to raise awareness, that these decisions can have major implications downstream.
My intuition is that as long as group objects inside JabRef are identified by, say, their index in an array or by their position in an internal tree structure, and not by their (external!) names, no big problems should arise.
I agree that the behaviour that emerges on the duplicate group names is somewhat unusual, but it is now quite clear to me, and as long as it is consistent and no group information loss arises when manipulating groups, this might actually be an advantage of Jabref :).
The most important issue is to preserve group information without information loss when importing old group files. As @swekia wrote in the recent issue #7442, creating a good grouping for yourself is "many, many hours of manual work" (which I can confirm from my side).
To import groups created in the pre-JabRef 3.x version, when the group structure changed to unique (and so duplicates where very likely present in the input databases), a following import strategy might help:
a) if a group name is unique leave its name as it is; b) if an old group name is a duplicate, postfix it with the position in the tree (e.g. in braces, 'Treatment (Medicine/Asthma)'); in this way a unique name is created with all old entries assigned to it (no info loss). A user can then rename or retain the names, as they wish;
If at some point it is decided that the groups in distinct parts of the tree should be distinct (either by appending group ID or by some other mechanism), importing from the current JabRef group tree (with possible duplicates, which, recall, can be made by manual editing of the file!) should assign the same entries to all new groups, make groups "synchronised" if syncrinisation is supported, and if not – issue a warning that this function is lost. IMHO this would very much ease the pain of upgrading and make work experience with JabRef much smoother :).
Note that import of previous databases is a separate issue from the current behaviour of the JabRef group system at any implementation point, and it should be addressed separately.
Seems like the road ahead is pretty save with these considerations... Or are there any dragons ahead which I have overlooked?
@sauliusg : Thank you for this extensive explanation. I think I would be happy if: 1) There is always a warning when a new group is created who's name is NOT unique (as proposed here before). 2) There is a warning when a (previous) JabRef database is imported which does not contain unique groups. It would be best, if the position of these groups in the group tree could be indicated. Note, that this is of relevance for users with many groups - I am using thousands of manually created groups (so, yes, I would agree that creating a good grouping takes "many, many hours of manual work", if not more ;)). 3) You suggested that when importing old group names the name of non-unique groups could be fixed with the position in the tree. While I agree, that this sounds like a reasonable quick fix/workaround, I wonder, what effect this will have on databases with thousands of groups and a large group tree depth. Again, in my example, I have a group tree depth of up to 26 - if my database were to contain any non-unique groups, your solution would result in groups with VERY VERY long names. I wonder, whether those names would then exceed some threshold for character limits in group names and what that effect could have (in either saving this modified group trees or the display of them).
I think I would be happy if:
There is always a warning when a new group is created who's name is NOT unique (as proposed here before).
Agreed; a possible implementation is in the suggested in the PR #7558 (and a screenshot is present demonstrating the look-n-feel). Please have a look at the branch/screenshot how you find it.
There is a warning when a (previous) JabRef database is imported which does not contain unique groups. It would be best, if the position of these groups in the group tree could be indicated. Note, that this is of relevance for users with many groups - I am using thousands of manually created groups (so, yes, I would agree that creating a good grouping takes "many, many hours of manual work", if not more ;)).
I'm very glad that you confirm the importance of grouping for the users of JabRef, and your info volumes are impressive :).
Several ideas how JabRef could highlight duplicated groups in a convenient way:
Which of the display features, or which combination you think are most usable?
I think, however, that it would be very inconvenient if JabRef would pop up a warning box each time a database with duplicated groups is read in. The duplicates should be accepted and dealt with as the user wishes.
You suggested that when importing old group names the name of non-unique groups could be fixed with the position in the tree. While I agree, that this sounds like a reasonable quick fix/workaround, I wonder, what effect this will have on databases with thousands of groups and a large group tree depth. Again, in my example, I have a group tree depth of up to 26 - if my database were to contain any non-unique groups, your solution would result in groups with VERY VERY long names. I wonder, whether those names would then exceed some threshold for character limits in group names and what that effect could have (in either saving this modified group trees or the display of them).
Indeed, deep trees like yours will have problems. A further refinement of the "tree prefix" method would be to use only as many levels of the tree as is needed for disambiguation. For instance, if two of your no-synonymous groups would be at level 26, but they would differ at the 24-th level, only two extra group prefixes would be needed ('Leaf Group (branch24/branch25)').
If even this is too long (one would set some limit, as you suggest), one could resort to middle character contraction (as in 'i18n') and/or to plain numbering ('i18n, 2nd group'). Of course this is only the import behaviour, the user can then rename the groups as one wishes.
Note that in my suggestion this would only be relevant for importing old group trees that gad synonymous but different groups. If your group has no synonyms, or if you synonyms follow the suggested behaviour (i.e. they are intended to always keep the same entries), no transformation of your tree would be necessary.
So, are we going for it? :)
NB. The PR #7558 does not contain import functionality, it only contains changes in new format group handling.
So far sounds this good to me. The import should not matter too much in my personal case (hopefully) since I tried to make sure, that all my group names were unique (I am still forced to use 3.8.2 since I cannot use JabRef before the following feature request is implemented: https://github.com/JabRef/jabref/issues/4237). But I thought, I should mention these issues, since other users might not have given that much attention to this.
So here my take on the import feature (which might be added later?): 1) The user should be able to turn on/off the warning when dealing with a database, that contains non-unique group names. I see your point that it can be inconvenient for users who have decided to accept the duplicates. But for ?most users (including myself) it is essential that it is clear, that the database contains non-unique group names that must be dealt with. Those should be able to see this every time they have to deal with such a database. Therefore the warning needs to be a feature that guarantees to find the duplicates and it needs to be available to those who require it. Therefore maybe a switch to turn on/off this warning message could be a solution (but maybe there is a better one)? 2) You suggest various ways on how JabRef could highlight duplicated groups. All of them seem reasonable to me, but they have one flaw. They do not work well, if you have very large group trees, i. e. thousands of groups. A user is not going to scroll through the whole group tree to find all the duplicates. Instead, JabRef should tell the user directly, where to find those groups. Two things come to my mind: a) Either the groups filtering can be used to display all non-unique groups AND/OR b) a separate list is shown, that tells the users exactly where to find the non-unique groups by giving the full group path for each affected group There might be a better way to do this, but these are the two options that come to my mind at the moment, that deal with the problem of non-unique groups within massive group trees.
Thanks for your help!
- The user should be able to turn on/off the warning when dealing with a database, that contains non-unique group names. I see your point that it can be inconvenient for users who have decided to accept the duplicates. But for ?most users (including myself) it is essential that it is clear, that the database contains non-unique group names that must be dealt with. Those should be able to see this every time they have to deal with such a database. Therefore the warning needs to be a feature that guarantees to find the duplicates and it needs to be available to those who require it. Therefore maybe a switch to turn on/off this warning message could be a solution (but maybe there is a better one)?
In the PR that I suggest groups with repeated names should become, once again, a normal situation in JabRef databases. Thus making a popup window each time you start JabRef, even if you can "configure it away", looks like an inconsistent behaviour (why should a program stop your workflow to warn about a feature which it created itself and which fully supports?) See below for a possible idea.
- You suggest various ways on how JabRef could highlight duplicated groups. All of them seem reasonable to me, but they have one flaw. They do not work well, if you have very large group trees, i. e. thousands of groups. A user is not going to scroll through the whole group tree to find all the duplicates. Instead, JabRef should tell the user directly, where to find those groups. Two things come to my mind: a) Either the groups filtering can be used to display all non-unique groups AND/OR b) a separate list is shown, that tells the users exactly where to find the non-unique groups by giving the full group path for each affected group There might be a better way to do this, but these are the two options that come to my mind at the moment, that deal with the problem of non-unique groups within massive group trees.
A quick indication of whether the current database contains groups with repeated names would be a dashboard-style indicator that says:
There are 8 groups with _repeated_ names in the database.
This announcement could be added right below the "Filter groups" input field, and above the "All entries" group. Optionally, to could contain number of all groups as well (and so it would be displayed always, not only when groups with repeated names are present). Also optionally, there could be 3-state switch next to the announcement: "show only repeated groups/show only unique groups/show all groups", so that you can locate such groups fast (and rename them if you wish).
"Optionally" in this context means that, IMHO, JabRef group interface both with and without these features would be very usable, so maintainers/community should consider whether the benefits of adding extra these features justifies extra work and code increase.
If the group count and duplicated groups are not displayed in JabRef itself, they are still possible to find quickly with some command-line magic :):
saulius@starta multipl-groups/ $ awk -F: '$1~/ StaticGroup$/{print $2}' c-standard-test.bib | awk -F'\' '{print $1}' | sort | uniq -c | sort -nr -k1,1 | awk '$1>1'
2 Ontologies
2 MOFs
2 Languages
2 Databases
2 Data
2 Crystallography
2 COD
2 CIF
saulius@starta multipl-groups/ $ awk -F: '$1~/ StaticGroup$/{print $2}' c-standard-test.bib | awk -F'\' '{print $1}' | sort | uniq -c | sort -nr -k1,1 | awk '$1>1' | wc -l
8
These bash
commands were run on Linux, but should work on MaOS and on Windows with CygWin as well; they can be defined as short commands such as 'bibtex-count-goups' or similar.
From a perspective of Unix principles as I understand them, a function that can be conveniently factored out into a separate, independent program should be factored out and not clutter the main tool. But this might be a different philosophy from what Windows/Mac users are used to... :)
A quick indication of whether the current database contains groups with repeated names would be a dashboard-style indicator that says:
There are 8 groups with _repeated_ names in the database.
This announcement could be added right below the "Filter groups" input field, and above the "All entries" group.
If the "repeated" names are mentioned in the announcement, then this would indeed already improve the situation. Note, that with the current JabRef version not having the full path is less of a problem, since you can easily filter the groups using the groups filter if you know, which groups are affected. With the old JabRef (e.g., 3.8.2) this was not possible, since there was no such filter. Yet, having the direct path or automatically filtering the respective groups (as you propose in the options) would be even better, since it is not intuitive to the user, that one has to find the duplicated groups by themselves.
Maybe I should explain further, why this can be a problem: If you have a database with thousands of groups, it might not be obvious when creating a new group, that it already exists somewhere in the database (you will not notice this immediately just by looking at the groups panel). If a user wants to avoid having unique groups (there are domain-specific reasons for this, which I won't go into), not knowing about the existence and the location of these non-unique group names can be an issue.
As for the command-line magic: Yes, that would be a workaround and I appreciate you sharing the respective scripts. But I would argue, that this is detrimental to the user experience. People will want to have a simple reference does-it-all-for-you tool. Further: If we were to go by a strict reading of Unix principles, then one shouldn't even use JabRef in the first place. It is definitely not a simple tool for a single use, but a quite complex piece of software. If you wanted to, you could also emulate most (all?) functions of JabRef using Emacs + various scripts (in fact I know people, that do that) - but for most people that would just be inconvenient and I do not think that is the goal of JabRef.
Thanks for the ongoing discussion. The twitter poll clearly showed that many users would like to have groups with the same name. However, it became also clear that it might be confusing if the entries are automatically shared. In the devcall we also discussed that there are also scenarios where the groups with the same name actually don't show the same entries (e.g. if one uses the hierarchical modifiers), potentially leading to even more confusion.
For this reason I would strongly propose to go the id-based solution:
groups
field and searching for the identifier of the other group.This unique identifier will also be very helpful for other scenarios. For example, we are planning a new feature in connection with JabRef Online, where users can share a group. For this a unique identifier is essential, since one needs to a way to identify the group even for different users.
@sauliusg your work here in this PR is very much appreciated. But I think the id-based system is the more universal way forward. Would you be interested in implementing it? We core developers help of course where we can!
@sauliusg your work here in this PR is very much appreciated. But I think the id-based system is the more universal way forward. Would you be interested in implementing it? We core developers help of course where we can!
Let me think if the PR can be extended, with reasonable effort, to handle uniquely identified groups.
I'm a bit worried that the full unique group ID solution, while I agree that it is cleaner, but is at the same time more complicated, will take more time to implement, may introduce new bugs; whereas we need badly to continue working with our bibliographies, and the limitation on group names makes this cumbersome...
For this reason I would strongly propose to go the id-based solution:
* Each group gets an unique identifier
I agree that this is a more universal and more usual solution.
But then, more questions arise:
Technically, (4) will probably be the easiest to handle both manually and in JabRef code, but will bloat the files.
Your ideas?
* The identifier is stored in the groups field (and no longer the name of the group) * Users can create completely independent groups with the same name (even in the same hierarchy level if they desire, although a warning should be shown in this case)
True.
* In case, they want to have a mirror group they can use an automatic keyword group pointing to the `groups` field and searching for the identifier of the other group.
Or create a duplicate manually, using a text editor :). The bottom line is that JabRef, IMHO, will have to deal with such situations anyway.
For example, we are planning a new feature in connection with JabRef Online, where users can share a group. For this a unique identifier is essential, since one needs to a way to identify the group even for different users.
In that case UUID or UUID+Group name seems the best suitable solution.
Maybe I should explain further, why this can be a problem: If you have a database with thousands of groups, it might not be obvious when creating a new group, that it already exists somewhere in the database (you will not notice this immediately just by looking at the groups panel). If a user wants to avoid having unique groups (there are domain-specific reasons for this, which I won't go into), not knowing about the existence and the location of these non-unique group names can be an issue.
This is actually handled in the proposed PR: when you try to create a group with the name that already exists in your tree, a yellow exclamation mark appears, and a tool-tip explains what would happen if you go for it. This is proposed instead of bluntly forbidding such groups. You can decide whether you want a duplicated group name, or whether you add more charters for disambiguation.
The identifier is stored in the groups field (and no longer the name of the group)
In the devcall we also discussed that there are also scenarios where the groups with the same name actually don't show the same entries (e.g. if one uses the hierarchical modifiers), potentially leading to even more confusion.
Since a user would be warned when creating groups with the repeated name, the confusion should be minimised.
I have experimented for a while with the code in the PR #7558 branch, and to my taste no confusion ensues; more confusion actually arises from the Issue #7576, IMHO... :)
But sure, I agree that the proposed behaviour would be unusual (how is it in German? Gewöhnungsbedürftig?)
Maybe I should explain further, why this can be a problem: If you have a database with thousands of groups, it might not be obvious when creating a new group, that it already exists somewhere in the database (you will not notice this immediately just by looking at the groups panel). If a user wants to avoid having unique groups (there are domain-specific reasons for this, which I won't go into), not knowing about the existence and the location of these non-unique group names can be an issue.
This is actually handled in the proposed PR: when you try to create a group with the name that already exists in your tree, a yellow exclamation mark appears, and a tool-tip explains what would happen if you go for it. This is proposed instead of bluntly forbidding such groups. You can decide whether yo want a duplicated group name, or whether you add more charters for disambiguation.
Ok, if this works rock-solid, then there should be no problems, except for (a) older databases, as discussed before, or (b) users that change their opinion in regards to unique groups. (a) So basically only someone, who has an older database, requires unique groups, but is not aware that the older database currently does not meet the requirements, would need help from JabRef in finding those or (b) people who later on realise that they prefer unique groups.
The identifier is stored in the groups field (and no longer the name of the group)
- Does this have any unwanted side effects when using the article search function, i. e. will "groups = Temperature" (assuming you have a group named "Temperature") still work? Or does the user now need to know the unique identifier?
This is a actually a good question: when I search for "Temperature", should all entries in all groups called "Temperature" be included?
- Similar question: Will the group filtering be unaffected by this change?
Hmmm... Probably groups should still be filtered by names, not by IDs?
This unique identifier will also be very helpful for other scenarios. For example, we are planning a new feature in connection with JabRef Online, where users can share a group. For this a unique identifier is essential, since one needs to a way to identify the group even for different users.
What if we look at the situation at the different angle (and this is probably a more correct description):
under the proposed PR #7558, the group names are still unique, but you are allowed to have the same group in different parts of the tree :). As a in the Unix hard link example.
This, for sure, dos not hold for non-leaf groups of the "union" type, but for me this is desired behaviour.
The identifier is stored in the groups field (and no longer the name of the group)
- Does this have any unwanted side effects when using the article search function, i. e. will "groups = Temperature" (assuming you have a group named "Temperature") still work? Or does the user now need to know the unique identifier?
This is a actually a good question: when I search for "Temperature", should all entries in groups called "Temperature" be included?
I think so, but since I intend to keep unique group names, I am not so sure about what people want with the actual user case. I reckon if you want both options, you could have one groups field and one ID field. People who want to find all entries in groups called "Temperature" search for groups, others use the ID (problem there is still that users will need to know the ID).
- Similar question: Will the group filtering be unaffected by this change?
Hmmm... Probably groups should still be filtered by names, not by IDs?
Agree, should be filtered by group names.
I suggest we accept that we loose manual editing and would go for uuid's. The user experience can be improved afterwards by using a tag-based interface, for easy management of groups from the entry editor.
I'm not sure why you would like to search for groups = ...
. In this case it's probably easier to just select the group in the tree... Let's not worry about this too much until users report they use this often for reason xyz. (Except if I'm missing an important use case)
Or create a duplicate manually, using a text editor :). The bottom line is that JabRef, IMHO, will have to deal with such situations anyway.
Just issue a warning in this case. The point is that groups have unique identifiers, if you mess with this then it's not Jabref's job to fix it for you.
I'm not sure why you would like to search for
groups = ...
. In this case it's probably easier to just select the group in the tree... Let's not worry about this too much until users report they use this often for reason xyz. (Except if I'm missing an important use case)
Well, I use this all the time, since finding the group in the tree takes much longer if you have to go through thousands of groups. So, this is a real use case.
I suggest we accept that we loose manual editing and would go for uuid's. The user experience can be improved afterwards by using a tag-based interface, for easy management of groups from the entry editor.
This does not sound good at all:
This unique identifier will also be very helpful for other scenarios. For example, we are planning a new feature in connection with JabRef Online, where users can share a group. For this a unique identifier is essential, since one needs to a way to identify the group even for different users.
A a side note: as far as group/file sharing goes, this problem we solved at my site once and for all, by simply using an SVN repo for storing bibliographies. This solution works perfectly since 2008, and Subversion gives numerous advantages: a) unlimited undo b) full history of changes c) synchronisation (with automatic merges) on different computers, for different users, and even in different directories of the same computer d) easy and informative diff's... and many more bonuses.
What makes it difficult to share data is namely the decision to forbid (in the GUI) groups with identical names. It is all that natural that I (as a user sg) will want to have group "Languages", and my colleague also wants to have group "Languages". Why not? If the same papers land into that group, this is on one hand a benefit for us, on the other hand easy to sort out afterwards if we decide to split the group.
This is actually handled in the proposed PR: when you try to create a group with the name that already exists in your tree, a yellow exclamation mark appears, and a tool-tip explains what would happen if you go for it. This is proposed instead of bluntly forbidding such groups. You can decide whether yo want a duplicated group name, or whether you add more charters for disambiguation.
Ok, if this works rock-solid, then there should be no problems,
Well, I can not promise that it is "rock-solid" ("no warranty" disclaimer, as always :); but I have now tested my branch for quite a while and things work as expected; I also do not see any big dangers ahead since, as I have mentioned, the proposed PR #7558 is a conservative extension, it does not change the current design of the groups, just allows to use more features in it.
except for (a) older databases, as discussed before, or (b) users that change their opinion in regards to unique groups. (a) So basically only someone, who has an older database, requires unique groups, but is not aware that the older database currently does not meet the requirements, would need help from JabRef in finding those or (b) people who later on realise that they prefer unique groups.
People who read old databases with unique groups will not be affected at all, since the extension leaves all old functions intact.
People who wish to maintain unique groups in their databases are in the position to do so easily; essentially they get the same message as before if they try to create group with a duplicated name. The only difference is that in the proposed PR #7558, you are no longer left with the disabled "OK" button, but you have a explanation and a choice. In addition, several small fixes take care that group renaming and deleting does not loose information when you have duplicated groups. Since you can have duplicated groups anyway (disabling "OK" button in the GUI does not ensure that program invariant – unique group names – is maintained in a database), the fixes are needed regardless to how GUI behaves.
People (like me) who want to reuse group names are in a position to do so conveniently.
People who later want to make groups unique can do so by renaming one or several groups; the code fixes in the PR #7558 take care that the entries stay assigned to the old group when a duplicated group is renamed, and to the new group if you select "assign all entries to the new group". So the system is very flexible in what you do.
Seems like everyone-wins situation, doesn't it?
@sauliusg your work here in this PR is very much appreciated. But I think the id-based system is the more universal way forward. Would you be interested in implementing it? We core developers help of course where we can!
@tobiasdiez These are hard choices. May I suggest the following roadmap for group interface:
With (1) and (2) in place, people (including me) can go on migrating to new JabRef (I would do this in my branch and start using it on the spot, and the switch to >= version 5.4 when this comes out; I can not use 5.2 since it still has #7548 PR not merged and I can not use 5.3 since it corrupts my database (issue #7010). Those who use unique group names within their databases, as @AEgit described it in detail, would continue to have full functionality available to them. Their databases would continue to work as before.
Unique shared groups and collaboration scenarios are nice to have; however, these features are trickier to implement and needs careful design and more time; they could come somewhat later. Their introduction would be compatible with changes in (1) if it is compatible with the current JabRef group implementation.
Would you (developers) agree on such process?
Update: actually, after reading discussion in #7010 I found a workaround for my database: setting "Resolve strings for all fields except: url;abstract;not" saves and reads the database correctly again, without touching the abstract field.
I suggest we accept that we loose manual editing and would go for uuid's. The user experience can be improved afterwards by using a tag-based interface, for easy management of groups from the entry editor.
On a third thought, it seems that there is a way to introduce UUIDs without loosing editability:
Languages(36592084-944f-11eb-b17a-5fe0541bb1b3)
);This would ensure smooth migration from the current group implementation and to the future eventual sharing of groups on a server.
https://github.com/JabRef/jabref/issues/7554#issuecomment-812814351 These are all valid concerns. But most of these disadvantages can be taken care of by a more advanced implementation. For example, search for groups = ...
can look up the groups by name, and then search for the entries that have the corresponding group-identifier. Similarly, before sharing (e.g. export) one could replace the group name by the identifier.
https://github.com/JabRef/jabref/issues/7554#issuecomment-812825202 The problem I see with this approach is that there is a time where people may create groups with duplicate names. This makes the migration a lot harder. Thus I would prefer to sort this out before.
#7554 (comment) The problem I see with this approach is that there is a time where people may create groups with duplicate names. This makes the migration a lot harder. Thus I would prefer to sort this out before.
If people will create groups with duplicate names, even after being warned, this means that they need such groups. Why not supporting this workflow?
For eventual migration to groups with IDs, I have outlined one of the possible migration strategies above; it seems to be simple and to maintain smooth transition. Sure one needs to be cautious, but there seems to no unsurmountable difficulties on this path.
#7554 (comment) These are all valid concerns. But most of these disadvantages can be taken care of by a more advanced implementation. For example, search for
groups = ...
can look up the groups by name, and then search for the entries that have the corresponding group-identifier. Similarly, before sharing (e.g. export) one could replace the group name by the identifier.
More advanced implementation inevitably means more complicated, has chances to more bugs, and takes more time to implement.
As you surely know: the cheapest, fastest and most reliable software components are those that are absent :)
If people will create groups with duplicate names, even after being warned, this means that they need such groups. Why not supporting this workflow?
I really would like people to have groups with duplicate names. It's just that I feel this should be properly implemented instead of using a temporary workaround that makes the proper implementation harder.
I really would like people to have groups with duplicate names. It's just that I feel this should be properly implemented instead of using a temporary workaround that makes the proper implementation harder.
My argument would be that, on one hand, the proper implementation will not be made (much) harder by the suggested change; on the other hand, a quick fix now would enable users to continue working on their databases, while at the same time developers could design and implement group update without hectic, in parallel, and have enough time to consider multiple possibilities and multiple scenarios.
@tobiasdiez These are hard choices. May I suggest the following roadmap for group interface:
From a user perspective this roadmap sounds like a good plan.
#7554 (comment) The problem I see with this approach is that there is a time where people may create groups with duplicate names. This makes the migration a lot harder. Thus I would prefer to sort this out before.
If people will create groups with duplicate names, even after being warned, this means that they need such groups. Why not supporting this workflow?
I reckon one problem here could be, that user initially (mistakenly) ignore the warning and only later on realize, that they actually wanted unique groups. Yes, that would be a user mistake - but it is something to think about. Fortunately, you already outlined an approach to deal with this issue: If the non-unique groups are flagged and the groups can be filtered to show the non-unique ones, then the user can easily solve this problem within JabRef.
#7554 (comment) The problem I see with this approach is that there is a time where people may create groups with duplicate names. This makes the migration a lot harder. Thus I would prefer to sort this out before.
If people will create groups with duplicate names, even after being warned, this means that they need such groups. Why not supporting this workflow?
I reckon one problem here could be, that user initially (mistakenly) ignore the warning and only later on realize, that they actually wanted unique groups. Yes, that would be a user mistake - but it is something to think about. Fortunately, you already outlined an approach to deal with this issue: If the non-unique groups are flagged and the groups can be filtered to show the non-unique ones, then the user can easily solve this problem within JabRef.
Correct, the non-unique groups should be indicated, and I would implement this in the PR #7558 if we get a go from developers :).
When discovered, duplicated groups can be renamed without loss of information – this already works in the current PR (tested in several different situations). Entries assigned to the original group stay in that group, and you have a choice whether to include the same entries into the newly renamed group or not. Here the existing JabRef functionality (a popup that asks whether the modified group should receive the entries of the original group) turned out very useful.
I really would like people to have groups with duplicate names. It's just that I feel this should be properly implemented instead of using a temporary workaround that makes the proper implementation harder.
My argument would be that, on one hand, the proper implementation will not be made (much) harder by the suggested change; on the other hand, a quick fix now would enable users to continue working on their databases, while at the same time developers could design and implement group update without hectic, in parallel, and have enough time to consider multiple possibilities and multiple scenarios.
I would agree with that - people, who already need this functionality, would have immediately access to it. Given, that as @sauliusg points out, this is technically already part of JabRef (it is just the GUI that forbids creating non-unique groups), the resulting changes to JabRef appear minor. At a later stage, an improved solution as proposed by @tobiasdiez could be implemented - but this might take some time during which the users which requires this feature are left without options. Note also, that by allowing this minor fix, experience can be gained regarding the pros and cons of allowing non-unique group names. Maybe there are some hidden drawbacks that are not obvious at the moment.
Finally, if that is not convincing. maybe a compromise could be a solution. Implement the fix as described by @sauliusg but add an additional setting option to the preferences, where the user can actively control whether non-unique groups should be allowed in the GUI or not. Set the default to not allowing the creation of non-unique groups in the GUI (the current default). Add a description/tooltip indicating, that the user has to know, what he is doing if he wishes to allow non-unique groups and maybe call this a BETA feature (given that a later point in time this implementation might be replaced by a more sophisticated one). I reckon this should address the possible concerns and hopefully not be too much more work to implement?
Duplicated groups in different branches of the group tree are not detected when a database that contains them is read in.
But on a second thought, and after playing a bit with JabRef 5.x master branch, I start thinking that this is not a bug but actually a very useful feature! I've opened a detailed discussion in https://discourse.jabref.org/t/hierarchical-groups-with-duplicated-names-are-actually-working-in-jabref-5-x/2619. So don't fix it!
Steps to reproduce the behavior:
Start a new JabRef process, create a new library, add an entry, save the database as 'new.bib' (see the attached 'biblio.zip');
Show the groups interface, create two groups 'Human' and 'Computer', create a subgroup 'Languages' in 'Human', save over the 'new.bib';
Assign the new entry to 'Human/Languages';
Copy 'new.bib' to 'new-duplicate-group.bib';
Open the 'new-duplicate-group.bib' with a text editor, and duplicate the 'Languages' group in the 'Computer' branch:
The resulting BibTeX files are attached in 'biblio.zip': biblio.zip
Load the edited file into JabRef; no error report is produced, both groups are shown (correctly?), even though the GUI would not allow me to create such situation; the entry is reported to belong to both groups:
The manually created duplicate is used only to reproduce a minimal example of this behaviour; in real life large number of duplicates emerged when porting previous group trees from the previous revisions of my database.
This behaviour actually seems to be very useful, since it allows me to create (once again!) groups with identical names in different places of the group hierarchy, and assignment to one such group adds the same entry to all such groups, which might be very useful (detailed discussion in https://discourse.jabref.org/t/hierarchical-groups-with-duplicated-names-are-actually-working-in-jabref-5-x/2619 ).
This is how it looks after some editing of the database:
The resulting bibtex file is added in 'experimets-with-group-hierarchy.zip'. experimets-with-group-hierarchy.zip
Log File
``` Paste an excerpt of your log file here ```