Closed woodwm closed 10 months ago
This requires a huge change of the database and the UI component. I’m afraid that I can’t implement it in a short time.
I think the feature is really necessary, support this idea!
Hi @woodwm @Mrz-zz @fkcptlst @aur3l14no
Recently, I have a good idea to support nested tags/folders, but in an elegant way. I want to hear from you to see if it's really a good idea.
The basic design of Paperlib's tag system is to keep it simple as I want to use the simplest way to organize papers. Previously we didn't support nested tags/folders due to the atomicity. A tag/folder should be the smallest element in the system. For example, the tags of a paper related to 'semi-supervised image classification' should be semi-supervised
and classification
.
After reading some user feedback such as #237, I realised that combining atomic tags/folders to create a smartfilter is a good way to filter papers related to more than one topic such as the abovementioned one. The smartfilter has been introduced for a while and I personally love it.
Although smartfilter is a good workaround to create combined tags/folders, It still looks far away from nested tags/folders. Recently, after discussing with @charlieJ107, I have a good idea for the tag/folder system:
Parse the rule of a smartfilter into a graph and show it in the UI.
Suppose we have a smartfilter like this:
(tags.name == "semi-supervised") AND (tags.name == "classification")
It can help us to find the papers that have both two tags.
This smartfilter query can be transformed into a graph:
Apparently, even a complex query can also be transformed into a graph:
(tags.name == "semi-supervised") AND (tags.name == "classification" OR tags.name == "segmentation")
If we walk start from the Root Node
, we can get a view tree according to the nodes' depth:
// Tree of the graph1
| - Root Node
| - Statement: tags.name == "semi-supervised"
| - Statement: tags.name == "classification"
// Tree of the graph2
| - Root Node
| - Statement: tags.name == "semi-supervised"
| - Logic Node: OR
| - Statement: tags.name == "classification"
| - Statement: tags.name == "segmentation"
We can show this tree after you click the corresponding smartfilter like this:
Click a node in this tree can filter the database by the corresponding filter statement.
By doing so, the nested tags/folders can be achieved without changing the database structure. And we still have the atomicity.
I appreciate all your responses. 👍
I wonder if logic OR
is really necessary. It can be quite confusing when logic OR
is in the hierarchy of folders.
For hierarchical nested folders, the underlying logic is that any leaf(file) would "inherit" the attributes of all parent nodes. The displayed solution is a syntax tree rather than a file tree. I believe these two trees are quite different (i.e. in the syntax tree the non-leaf nodes are operators while in file trees the non-leaf nodes represents a line of succession).
I wonder if logic
OR
is really necessary. It can be quite confusing when logicOR
is in the hierarchy of folders.For hierarchical nested folders, the underlying logic is that any leaf(file) would "inherit" the attributes of all parent nodes. The displayed solution is a syntax tree rather than a file tree. I believe these two trees are quite different (i.e. in the syntax tree the non-leaf nodes are operators while in file trees the non-leaf nodes represents a line of succession).
@fkcptlst I agree that these two kinds of trees are different.
Showing a syntax tree is a compromise for nested tags/folders as I really don't want to discard the atomicity (and redesign the database structure😵).
When a logic OR node
's children consist of only tags/folders, it equals a parent set of them. If we can rename this logic node, that would be the best way. But I think it's hard to implement.
What about this solution:
For the example shown above, when we show the tree, the logic OR
is rendered as segmentation / classification
rather than Logic OR
.
If the logic node is an AND
, we render segmentation & classification
I wonder if logic
OR
is really necessary. It can be quite confusing when logicOR
is in the hierarchy of folders. For hierarchical nested folders, the underlying logic is that any leaf(file) would "inherit" the attributes of all parent nodes. The displayed solution is a syntax tree rather than a file tree. I believe these two trees are quite different (i.e. in the syntax tree the non-leaf nodes are operators while in file trees the non-leaf nodes represents a line of succession).@fkcptlst I agree that these two kinds of trees are different.
Showing a syntax tree is a compromise for nested tags/folders as I really don't want to discard the atomicity (and redesign the database structure😵).
When a
logic OR node
's children consist of only tags/folders, it equals a parent set of them. If we can rename this logic node, that would be the best way. But I think it's hard to implement.What about this solution:
For the example shown above, when we show the tree, the
logic OR
is rendered assegmentation / classification
rather thanLogic OR
.If the logic node is an
AND
, we rendersegmentation & classification
That would be a bottom-up rather than top-down approach. Suppose I want to construct a hierarchy of tags/folders, I would need to first construct the very bottom leaves, then bind them via logic OR in a bottom-up fashion. I don’t think that’s the best practice since it isn’t what most are accustomed to. I believe people tend to construct such hierarchy in a top-down fashion.
It’s a difficult thing to balance the completeness of functionality and usability.
I think the logic operators are suitable for tags, but less suitable for nested folders. Indeed you can represent nested folders with tags equivalently in a mathematical sense, but it just seems odd to use.
I still think a clean way to implement nested folders is using slashes in tags (as they did in url in S3 storage). Simply split tags by slashes, use longest prefix matching to determine hierarchy and grouping, and render the tree structure in ui. This shouldn’t affect db design, as everything is on the client side.
Example:
Paper A: cv/segmentation, ml/contrastive learning Paper B: cv/tracking, multi-modal/contrastive learning
when constructing rendered file hierarchy:
when doing Boolean operations on tags, hierarchies within tags should be neglected, nested tags should break down into multiple equal-level tags(for the atomicity of tags).
To elaborate,
Suppose user has 2 papers with following tags set:
Paper A: cv/segmentation
, ml/contrastive learning
Paper B: cv/tracking
, multi-modal/contrastive learning
When dealing with tags, I agree that the atomicity requirement should be met, therefore the tags should break down and treated as equal level.
Paper A: cv
, segmentation
, ml
, contrastive learning
Paper B: cv
, tracking
, multi-modal
, contrastive learning
When render them in a hierarchical file tree fashion (if the user chooses to), the displayed structure can be as follows:
To elaborate,
Suppose user has 2 papers with following tags set:
Paper A:
cv/segmentation
,ml/contrastive learning
Paper B:cv/tracking
,multi-modal/contrastive learning
Tags
When dealing with tags, I agree that the atomicity requirement should be met, therefore the tags should break down and treated as equal level.
Paper A:
cv
,segmentation
,ml
,contrastive learning
Paper B:cv
,tracking
,multi-modal
,contrastive learning
Nested folders
When render them in a hierarchical file tree fashion (if the user chooses to), the displayed structure can be as follows:
cv
- segmentation
- tracking
ml
- contrastive learning
multi-modal
- contrastive learning
That's interesting. I want to discuss more about your design:
Suppose I have a Paper C, it is a multi-modal segmentation paper. So I should have tags like:
Paper C: multi-modal/segmentation
.
When dealing with tags, is the segmentation
under multi-modal
identical to the one under cv
?
In the database, we store something like:
{
name: cv
...
}
{
name: cv/segmentation
...
}
or we store:
{
name: cv
...
}
{
name: segmentation
...
}
i.e., the atomicity should be kept only in UI or in both UI and database?
@fkcptlst
I think, for folders, your solution is really good. Because two subfolders under two different parent folders should be treated as two identities even if they have exactly the same name.
But for tags, I think due to the atomicity, they should not be treated as two identities. The segmentation
tag should be considered the same whether it is in hyper-tag A (e.g., cv
) or hyper-tag B. (e.g., multi-modal
). If we store the tags like this:
{
name: cv/segmentation
...
}
, when we want to find papers related to only segmentation
, the database need to give me the results with tags like:
{
name: segmentation ✅
...
}
{
name: cv/segmentation ✅
...
}
{
name: multi-modal/segmentation ✅
...
}
{
name: cv/classification ❌
...
}
It means that in the database, we have no atomicity right?
For the first question, I think the answer is yes, the segmentation
under multi-modal
is equivalent/identical to the one under cv
. In other words, the /
is only effective when doing "nested folder" related operations (i.e. rendering). When doing tag operations, it's treated as appending ,
. (i.e. cv/segmentation
-> cv
, segmentation
when doing tag operations)
The reason as follows:
segmentation
) is a keyword (e.g. terminologies/concepts like segmentation
and contrastive-learning
etc.), people tend to use them to retrieve relevant documents. When treating tags like cv/segmentation
and multi-modal/segmentation
, it's natural to interpret segmentation
as the reference to the same concept.misc
in notes/misc
, supplement/misc
), they may be less helpful when querying, since two misc
are not referring to the same concept or entity, but that's up to the user to decide: when they choose to arrange tags in such way, they already gave up querying with such sub-tags. (They won't use sub-tags like misc
for tag queries since they already prefer hierarchical nested folder over tags. There's no need to consider whether the sub-tags are referring to the same concept or not, we just have to consider the first scenario and suppose that they do refer to the same concept or entity)For db implementation, I think storing cv/segmentation
would suffice (it is not so pretty since it sacrifices the atomicity of data table, but it gets the job done).
Or you can create a new table to store the hierarchy alone. The advantage of this method is that it does not break the process logic of original tags. The tags table can maintain its atomicity.
I agree that by creating a new table to store complex tag or folder relationships, user-defined smart filters can also be converted into such tag relationship entries. When querying, first generate query conditions based on tag or folder relationships, and then use these query conditions to query the database. This can preserve the atomicity of storage in the database.
In fact, this is equivalent to abandoning the original concept of folders. Folders are regarded as a special label. The so-called nested relationship between folders and labels becomes an AND logic between labels. I'm not sure whether we should embody this concept in the UI from a user experience perspective. For example, we can simply remove the concept of folders.
For db implementation, I think storing
cv/segmentation
would suffice (it is not so pretty since it sacrifices the atomicity of data table, but it gets the job done).Or you can create a new table to store the hierarchy alone. The advantage of this method is that it does not break the process logic of original tags. The tags table can maintain its atomicity.
@fkcptlst So now I understand your design:
cv/segmentation
. If the user also has a top-level tag segmentation
, or a child-tag under a different hyper-tag such as multi-modal/segmentation
, we store all of them in the database. Consequently, in this case, we have 3 objects in the database, but in the top-level design aspect, they indicate the same concept.cv, segmentation
segmentation
, the filtered results should be those with tags segmentation
or cv/segmentation
or multi-modal/segmentation
Pls correct me if I'm wrong.
For db implementation, I think storing
cv/segmentation
would suffice (it is not so pretty since it sacrifices the atomicity of data table, but it gets the job done). Or you can create a new table to store the hierarchy alone. The advantage of this method is that it does not break the process logic of original tags. The tags table can maintain its atomicity.@fkcptlst So now I understand your design:
- For design/concept aspect, tags should be atomic.
For database:
- we store hierarchical names like
cv/segmentation
. If the user also has a top-level tagsegmentation
, or a child-tag under a different hyper-tag such asmulti-modal/segmentation
, we store all of them in the database. Consequently, in this case, we have 3 objects in the database, but in the top-level design aspect, they indicate the same concept.For UI:
In the right details panel, main table view, tags should be shown in an atomic way like what we have now:
cv, segmentation
In the left sidebar panel, it depends on the user preference. We should have two modes:
Flat Mode: just like what we have in the current version.
Hierarchy Mode: show a tree rendered according to the hierarchy name
For querying:
- In the Flat Mode, if a user wants to show all papers related to a tag such as
segmentation
, the filtered results should be those with tagssegmentation
orcv/segmentation
ormulti-modal/segmentation
- In the Hierarchy Mode, we just show the results with the exact hierarchy tags.
Pls correct me if I'm wrong.
Yes, you’ve expressed very clearly.
For database design, I don’t have a direct preference since from my perspective either way works.
How about converting this issue to a discussion, so we can listen to more users' opinions? @GeoffreyChen777 @fkcptlst
For db implementation, I think storing
cv/segmentation
would suffice (it is not so pretty since it sacrifices the atomicity of data table, but it gets the job done). Or you can create a new table to store the hierarchy alone. The advantage of this method is that it does not break the process logic of original tags. The tags table can maintain its atomicity.@fkcptlst So now I understand your design:
- For design/concept aspect, tags should be atomic.
For database:
- we store hierarchical names like
cv/segmentation
. If the user also has a top-level tagsegmentation
, or a child-tag under a different hyper-tag such asmulti-modal/segmentation
, we store all of them in the database. Consequently, in this case, we have 3 objects in the database, but in the top-level design aspect, they indicate the same concept.For UI:
In the right details panel, main table view, tags should be shown in an atomic way like what we have now:
cv, segmentation
In the left sidebar panel, it depends on the user preference. We should have two modes:
Flat Mode: just like what we have in the current version.
Hierarchy Mode: show a tree rendered according to the hierarchy name
For querying:
- In the Flat Mode, if a user wants to show all papers related to a tag such as
segmentation
, the filtered results should be those with tagssegmentation
orcv/segmentation
ormulti-modal/segmentation
- In the Hierarchy Mode, we just show the results with the exact hierarchy tags.
Pls correct me if I'm wrong.
Yes, you’ve expressed very clearly.
For database design, I don’t have a direct preference since from my perspective either way works.
Really thanks for your response. For implementation, I will discuss it with @charlieJ107 later.
Obsidian supports nested tags using
#inbox/to-read
. Nested tags can replace folders to show hierarchy relationship.