DataJunction / dj

A metrics platform.
http://datajunction.io
MIT License
29 stars 13 forks source link

Proposal: Collections #1059

Open samredai opened 1 week ago

samredai commented 1 week ago

Currently, the UI Explore view is a top level view that includes all nodes in the system. The search helps but the hierarchical namespace explorer side tab becomes harder to navigate as you approach 1k+ nodes. This is a proposal for a lightweight metadata concept of a node "Collection".

Implementation I don't think we need to over complicate the implementation. I'm open to suggestions but I'm thinking a collections table that contains high level info on the collection (id, name, description, created_by, created_at, deleted_at) and a collection_nodes table that includes a mapping of collection IDs to node IDs.

Where this is heading This lightweight organization of nodes will allow users to sort of section off parts of the system in a way that does not interfere with the functional parts of the system (the actual graph). We can eventually allow linking collections to docs, memos, slack channels, etc. and use collection names+descriptions to inform a first pass by an LLM.

shangyian commented 1 week ago

Collections as a concept is definitely a good idea! Do you think we can reuse tags for saving collections in the backend though? The tag_type can just be collection.

It would be nice if we could auto-create a collection of each user's nodes, i.e., something like Yian's Nodes that encompasses all of the nodes that user created. Then we can just display that on their landing page by default. And of course a user can create other collections as well, which would also show up there, but this would remove a manual step for many of them.

samredai commented 1 week ago

I'm sure we can make it work using tags since they're pretty flexible, but I think this might make the metadata more messy in the long run since the relationship is reversed. A critical node could exist in 30 collections created by different users and having 30 tags on that node feels less intuitive than having each of those collections maintain their own pointer directly to that node. Also I suspect we'll want to continue to add more metadata around collections and I wouldn't want to shoehorn this into the tags construct. What do you think?

As for a "My Nodes" collection, I agree I think that would be super useful. I don't know that it would even need to be implemented via a collection though since it warrants a dedicated endpoint and maybe even a dedicated page in the UI. Would it be a cheap runtime query to pull all the nodes for the current user since that's stored on the node's metadata?

shangyian commented 1 week ago

fwiw, tags are actually implemented as pointers to nodes (basically a many-to-many relationship with a relation table), similar to how collections are stored behind the scenes. They only appear at the node level in the sense that they're displayed in the UI that way, but that can be changed. In fact, the way we've implemented tags seems to be ultra flexible and beyond the use cases of the vast majority of tagging users do. But you're right that with a separate table for collections, we could add more collection-specific metadata.

For "My Nodes", I think it would be useful if this is a collection that users can edit. As in, we can have it include all nodes they create but also allow them to add more nodes, since sometimes they may be working with nodes that they didn't create and want to see it on their dashboard.

samredai commented 1 week ago

For "My Nodes", I think it would be useful if this is a collection that users can edit. As in, we can have it include all nodes they create but also allow them to add more nodes, since sometimes they may be working with nodes that they didn't create and want to see it on their dashboard.

I see, that makes sense. Once they add nodes created by other people though that starts to make the my nodes view a little ambiguous. What do you think about having a very straightforward copy collection feature? "My Nodes" could be a virtual read-only collection and a user could copy it to make a physical collection that they can edit to add/remove nodes.