Open salmad3 opened 1 week ago
[23-09-2024]:
[!NOTE] TL;DR:
- Posts are being considered as the core primitive (one generic type).
- Boards are treated as a type of post with no parent to simplify the architecture.
- The system can use a slug hierarchy for efficient post organization and querying, allowing prefix-based searches to quickly identify relationships between posts and replies.
- AVL trees are implemented as the core data structure, ensuring balanced and efficient storage. However, concerns about title-based searches suggest supplementary structures might be necessary for non-hierarchical queries.
- While AVL trees perform well for insertions and lookups, managing everything in a single tree could lead to inefficiencies for complex queries. Multiple AVL trees or alternative structures may be used for different types of data (posts, comments).
- We will proceed with constructing the AVL tree based on the current Gno implementation and will assess its performance iteratively.
Discussions centered on how to structure boards and posts in a unified model, with posts as the base type for both. Treating boards as a type of post simplifies the architecture by allowing them to share common fields, such as slugs. Boards are differentiated by their parent-child relationships, where a board is simply a post without a parent.
In this model, everything is treated as a post, including boards. Boards do not require a separate structure, as they are identified by their lack of a parent. Managing boards, posts, and replies uniformly reduces complexity and avoids the need for multiple types. Comments and sub-posts are children of a parent post, and each post can be rendered differently depending on its label or type.
[!NOTE] This unified structure reduces the overhead of maintaining multiple types and adds flexibility at the UI level by allowing different post behaviors.
The slug system is key to organizing the hierarchy. A slug like 0_blog
could represent a board, while 0_blog:1_username1
represents the first post under that board. Additional nesting is handled by extending the slug (0_blog:1_username1:2_username2
). This slug hierarchy allows efficient traversal and retrieval of posts and their replies using prefix-based queries. For instance, all replies to 0_blog:1_username1
can be fetched by querying slugs that start with that prefix.
Using prefix searches allows efficient iteration through all replies or sub-posts. This method eliminates the need for complex joins or additional structures for parent-child queries, leveraging the slug structure to quickly identify related posts.
// Example Slug System:
// "0_blog" > board
// "0_blog:1_username1" > post under blog board
// "0_blog:1_username1r:2_username2" > reply to username1's post
When using plain slugs (no ID prefixes), we rely on string operations like HasPrefix
to manage traversal and filtering. This approach works but has limitations, especially in knowing when to stop the iteration.
start := "blog"
tree.Iterate(start, "", func(slug string, _ interface{}) bool {
if !strings.HasPrefix(slug, start) {
return true // Stop when slug no longer matches the prefix
}
println(" > " + slug) // Output: "blog", "blog/a", "blog/a/1", etc.
return false
})
Without a clear end point, we manually check if each item's slug still matches the prefix, which can be inefficient.
By adding ID prefixes to slugs (e.g., 1_blog
), we can identify iteration boundaries more effectively. This allows for simpler and more efficient traversal without needing string operations like HasPrefix
.
start := "1_" // ID is used as the start instead of slug
end := "2" // End is expressed as string(ID+1)
tree.Iterate(start, end, func(slug string, _ interface{}) bool {
println(" > " + slug) // Output: "1_blog", "1_blog/a", "1_blog/b", etc.
return false
})
Using ID prefixes allows iteration to be more efficient, as it avoids the need for string matching checks at each step.
The AVL tree is proposed as the primary data structure for storing posts, keyed by the slug. This ensures that all posts are stored in a balanced and efficient manner.
Although AVL trees offer O(log n) complexity for insertions and searches, concerns were raised about their efficiency for title-based searches. Searching for a post by its title would require traversing the entire tree, which is inefficient for large datasets.
[!WARNING] Global searches, particularly for non-slug fields like titles, might introduce inefficiencies especially for large datasets.
Title-based searches across the AVL tree are inefficient. Supplementary structures or inverted indexes are considered for fast lookups by title or other non-hierarchical fields. The slug structure remains useful for hierarchical queries, while additional indexing will handle broader searches.
Structurally, boards and posts are identical as posts. They can be distinguished by labels or type fields. For instance, boards can have the label "board"
, while posts might carry a different label. This distinction allows the UI to handle boards, threads, and comments differently, providing system behavior flexibility.
Efficient iteration is crucial, especially when accessing data chunks. All posts and sub-posts should be retrievable without multiple calls when iterating through a board. This is achievable by iterating over the slug hierarchy, using the prefix to access relevant posts quickly. AVL tree traversal allows for chunked access, ensuring that only relevant data is loaded, which optimizes performance.
Managing everything in a single AVL tree can lead to inefficiencies. Handling different aspects of the hierarchy (boards, posts, comments) within a single tree might not scale well, especially for complex queries like global title searches. The use of multiple AVL trees—one for each level of the hierarchy—or separate trees for posts and comments is being considered to improve performance.
[!TIP] Multiple AVL trees could be necessary to optimize data handling, improving performance by distributing the load across different trees for specific types of queries.
[!WARNING] This assumes the implementation in gno.land and needs checking
Operation | AVL Tree | With ID Prefixes | Global Search (Title) |
---|---|---|---|
Insertion | O(log n) | O(log n) | O(log n) |
Search (Slug) | O(log n) | O(log n) | O(n) |
Iteration | O(n * m) | O(n) | O(n) |
Structure | Space Complexity |
---|---|
AVL Tree | O(n) |
Slug Structure | O(n) |
For basic operations like insertion and search, we use an AVL tree as our core data structure. AVL trees are designed to remain balanced, which means their height is always logarithmic in proportion to the number of nodes. This is why insertion and search take O(log n) time. Every time a new post is added or searched for, the AVL tree needs to rebalance itself to ensure efficient future operations.
When iterating through posts using plain slugs (without ID prefixes), the system needs to compare each post's slug with a target prefix (using something like HasPrefix
). This adds a bit of overhead because it involves comparing strings, which can vary in length (denoted by m). So, if we're dealing with 1,000 posts and the slugs are 10 characters long on average, we're looking at O(n * m) time complexity.
The idea behind ID prefixes is to speed up this process. If every post’s slug has a unique numeric prefix (like 1_blog
, 2_blog
, etc.), the system can simply iterate through these posts in O(n) time, without needing to compare strings. It knows where to start and stop based on the IDs, making traversal faster.
Then, global searches, like looking for a post by its title, present more of a challenge with AVL trees. Since the tree is organized by slugs, not titles, you’d have to traverse the entire tree to check each title. This takes O(n) time. This inefficiency is why we mentioned possibly adding supplementary structures for fields like titles in the future. But, without them, a title search in an AVL tree is linear.
[26-09-2024]:
Originally shared internally
During the sync today, we spent some time catching up with Kirk and discussing previous topics.
Jerónimo has also shared additional sample work and notes:
Post
as the main type; potential structs for Board
, Post
, Poll
, Choice
, and Comment
and ways to manage multiple post types within a single Post
struct, considering options like interfaces, type fields, or inferring the type based on specific fields.blog/a/1
, blog/b/2:xxxxx/dddd
) to encode hierarchical relationships for easier data retrieval.We discussed whether posts should reference parent posts directly, and the impact on traversal and efficiency.
Another key challenge is naming and structuring the package, considering the unified "posts" structure, given its hierarchical nature. Simply calling it "posts" could be misleading and may not reflect its full functionality.
One approach to address package structure and naming could be to leverage the fact that the system organizes posts into AVL tree structures, defining relationships between boards, posts, and replies, effectively creating a new tree structure. Naming it something like
PostTree
,DiscussionTree
, orThreadTree
would better reflect this architecture. The package would be organized to reflect this form.Or multiple packages, so perhaps a package that defines the main posts type, and a "tree package" would use this to create boards. But generally it would drift away from the existing setup of boards v1 if adapting posts as the main type.
[Originally shared internally]:
Based on what Ilker shared with the help option that constructs a tx for function calls in the CLI—there could be something implemented in gno.land that resembles Actions and Blinks (in Solana), and used for /r/demo/boards/ V2
.
Blinks could be shareable links or URLs that let users initiate on-chain actions, like voting, posting, or commenting on boards. Someone could click a link in a message, on a board, or even on other social media, and it would open their wallet to approve the action without needing to directly interact with the dApp.
Actions could then handle the automation of common tasks like posting or moderating (across one or more boards). Instead of crafting a transaction from scratch, the user could trigger pre-built transactions that execute these tasks, making interactions with boards more intuitive, especially for those less familiar with blockchain.
An example of this is the ?help
option that Ilker mentioned, which can be added to any realm link to offer help commands (like this example). This could extend to triggering actions directly from realm links.
[30-09-2024]:
[!NOTE] TL;DR:
- Posts remain the core primitive in the system, represented by a generic
Post
struct. Boards are treated as level 0 posts with no parent, simplifying the overall architecture. All other entities, such as threads and comments, are managed hierarchically using levels.- Extensive discussions on forking mechanisms, particularly focusing on Copy-On-Write (COW) to ensure efficiency.
- To manage complexity and improve performance, multiple AVL trees may be used for different levels (e.g., boards, threads, comments) or content types.
Posts
(expanded):// Post represents a post in a hierarchical structure.
// The Level field determines the post's position in the hierarchy, and the Base field tracks if it is a fork of another post.
type Post struct {
ID string // Unique identifier for each post.
Level uint // Represents the depth in the hierarchy (e.g., 0 for boards, 1 for threads, etc.).
Children []*Post // Child posts, such as comments or replies.
Base *Post // The post this post was forked from. If nil, the post is original.
Forks []*Post // Stores references to posts that were forked from this post.
Content PostContent // The main content of the post, stored via an interface to allow for different content types.
CreatedAt time.Time // Timestamp when the post was created.
Creator std.Address // Address of the creator of the post.
}
// PostContent represents different types of content that a post can hold.
// Allows flexibility in the type of content associated with a post.
type PostContent interface {
Render() string // Returns a rendered representation of the content (e.g., HTML/Markdown).
}
// TextPostContent represents simple text content.
// This type holds text data and is rendered as plain text.
type TextPostContent struct {
Text string
}
// Render returns the raw text content.
// Used for displaying the content in a UI.
func (t TextPostContent) Render() string {
return t.Text
}
// PollPostContent represents a poll within a post.
// Stores a question, options for voting, and tracks the results.
type PollPostContent struct {
Question string // The question being asked in the poll.
Options []string // A list of possible options for voting.
Results PollResult // The results of the poll, tracking votes cast by users.
}
// PollResult tracks the votes in a poll.
// Maps each voter's address to their chosen option.
type PollResult struct {
Votes map[std.Address]string
}
// Numbers calculates the total votes for each option in the poll.
// Iterates over the votes and counts how many times each option was selected.
func (r PollResult) Numbers() map[string]int {
tally := make(map[string]int)
for _, option := range r.Votes {
tally[option]++
}
return tally
}
// AnyPostContent represents arbitrary or custom content in raw byte form.
// This type is used for content that doesn't fit predefined structures like text or polls.
type AnyPostContent []byte
// Render converts the raw byte content into a string representation.
// This allows raw content to be displayed as a string in the UI.
func (a AnyPostContent) Render() string {
return string(a)
}
[!WARNING] The results may vary depending on final implementation details and should be validated further.
Operation | Node Reference | Full Subtree Duplication | Copy-On-Write (COW) |
---|---|---|---|
Fork Creation | O(1) (reference only) | O(k) (k = size of subtree) | O(1) (reference only) |
Fork Modification | O(log n) | O(log n) | O(log n) |
Insertion (Post/Thread) | O(log n) | O(log n) | O(log n) |
Search (Slug) | O(log n) | O(log n) | O(log n) |
Iteration (All Posts) | O(n) | O(n) | O(n) |
Iteration by Level (Prefix) | O(k + log n) | O(k + log n) | O(k + log n) |
Global Search (Title) | O(n) | O(n) | O(n) |
Operation | Node Reference | Full Subtree Duplication | Copy-On-Write (COW) |
---|---|---|---|
Fork Creation | O(1) | O(k) | O(1) |
Fork Modification | O(log n) | O(log n) | O(n + k) (forked changes) |
AVL Tree | O(n) | O(n + k) | O(n + m × log n) |
Slug Structure | O(n) | O(n + k) | O(n + m × log n) |
Three distinct strategies were explored, each presenting different trade-offs in terms of time and space complexity:
Node Reference: Fork creation in Node Reference only involves referencing the original post without duplicating any data, resulting in O(1) time complexity. No data is copied until a modification occurs, which triggers O(log n) complexity for updating the AVL tree and copying the necessary nodes. This method is most effective when frequent forks are created that remain largely unchanged.
Full Subtree Duplication: In this strategy, the entire subtree is copied during the fork, incurring O(k) time complexity, where k
represents the size of the subtree. This ensures the forked post is independent from the original, but at a high initial cost. Post-fork modifications, however, follow standard AVL tree operations at O(log n). Full Subtree Duplication is best suited for forks that will undergo extensive independent modifications.
Copy-On-Write (COW): Initially, COW behaves similarly to Node Reference, with O(1) fork creation as no data is copied at first. When modifications are made, affected nodes are duplicated on demand, leading to O(log n) complexity for updates. While the initial cost is low, the complexity of handling deep changes mirrors that of Full Subtree Duplication. COW works well in scenarios where forks are frequent but modifications are rare or shallow.
For basic operations such as insertion and search using slugs, AVL trees maintain O(log n) complexity. However, iteration over all posts requires visiting each node, resulting in O(n) complexity. When iterating by hierarchical level (prefix-based), the complexity becomes O(k + log n), where k
accounts for the size of the subtree at that level.
The use of multiple AVL trees, where each level (boards, threads, comments) is separated into its own tree, localizes operations. For example, inserting a comment does not affect the tree managing boards, keeping operations efficient within the appropriate scope. This approach reduces unnecessary traversal and allows for more targeted interactions within specific levels of the hierarchy.
Global searches, particularly those based on titles, are more challenging. Since the AVL trees planned are keyed by slugs, title-based searches require traversing the entire tree, leading to O(n) complexity. Without supplementary indexing structures for non-slug fields like titles, global searches become inefficient in large datasets.
Forking strategies that reduce upfront costs, like Node Reference and COW, shift the complexity to the modification phase. Full Subtree Duplication avoids this, ensuring immediate independence for the forked posts, though at a significant initial cost. Iteration complexity remains O(n) across all strategies, with efficiencies gained when operations are scoped to specific trees in the multiple AVL tree model.
Fork creation with Node Reference incurs minimal gas costs, as no data is copied and only a reference to the original post is stored.
When modifications are made, gas costs increase due to the need to copy affected nodes and update the AVL tree. The costs scale with the depth of the modification, as each change requires writing to storage.
This method works well for frequent forks with few expected modifications, keeping the initial gas costs low.
Fork creation using Full Subtree Duplication incurs higher gas costs because the entire subtree, including all nodes and children, is duplicated and written to storage.
Once the fork is created, further modifications have lower gas costs, as the forked data is independent and does not require further copying.
This approach is suitable when the forked data needs to be modified significantly and independently from the original.
COW has low gas costs during fork creation, as no data is copied initially—only references to the original nodes are created.
When modifications are made, gas costs increase due to the need to duplicate the affected nodes and write the changes to storage. These costs scale with the depth of the modifications.
COW is useful when forks are frequent but are not expected to undergo extensive modifications.
[30-09-2024]:
Action Items:
The following suggestions focus on expanding the architecture and enabling potential functionality for developers building and using a "board" as a realm. Some ideas may be better suited for future versions (e.g., "v3") but are outlined here for consideration:
Concurrency and Parallelism Considerations:
Managing concurrent interactions with posts is essential for ensuring data integrity in realms with high activity. Implementing concurrency control mechanisms, such as locks or transactions, would prevent conflicts during simultaneous modifications or insertions, while maintaining performance and minimizing gas costs.
Cache Mechanisms for Performance:
Introducing caching mechanisms could improve performance for frequently accessed posts or global searches. This would help reduce the computational load on the AVL trees, especially for high-traffic boards, enabling faster read operations.
Boards Referencing Each Other:
Boards could reference one another across different realms using the slugs or unique identifiers. This allows content from one board to be cited or linked within another, much like hyperlinks. This functionality would enable discussions to span multiple boards and realms.
Boards Interacting with Each Other:
Facilitating interaction between boards through inter-realm messaging can enable boards to dynamically pull in content from other boards. For example, a governance board might incorporate discussions from related realms without duplicating the data. This interaction would be governed by the rules of each board.
Pinning Posts Within Boards:
Pinning posts within boards could be implemented by adding metadata to designate specific posts as "pinned", or by a Post
behavior (isPinned
). Pinned posts would appear at the top of the board's display until unpinned.
Role-Based Access Control (e.g., Moderation):
Role-based access control (RBAC) logic could be introduced to define roles such as admin, moderator, or user within a realm. RBAC would enforce permissions based on roles, determining who can post, comment, or moderate. Additional access controls, such as allowlists or private boards, could be implemented to restrict access to specific users or groups.
[02-02-2024]:
We will use a feature branch on the gno core repo to leverage existing CI.
Here is an overview of common platforms and their respective content structures and interaction models:
Platform | Primary Content Format | Content Types | Data Structure | Threading Model | Interaction Depth | Primary Use Case |
---|---|---|---|---|---|---|
Text, links, images, video | Posts, comments, replies | Tree-based (nested posts) | Deeply nested (hierarchical) | Deep (multi-level conversations) | Community-driven discussions | |
X/Twitter | Short text, media | Tweets, retweets, replies | Flat posts with links | Limited flat threading (chains) | Shallow (quick replies, retweets) | Broadcast + real-time commentary |
Mixed media | Posts, comments, reactions | Flat posts, shallow threads | Limited shallow threading (comments) | Moderate (comment chains) | Social networking + sharing | |
Discord | Text, voice, media | Messages, replies, threads | Flat posts in channels | Shallow threading (replies within channels) | Deep (live, threaded interactions) | Real-time communication (text/voice) |
Quora | Text (Q&A) | Questions, answers, comments | Tree-based (nested Q&A) | Nested threads (answers/comments) | Deep (threaded question responses) | Knowledge-sharing, Q&A |
Media (images, video) | Posts, comments | Flat posts | Shallow threading (flat comments) | Shallow (quick comment replies) | Visual content sharing | |
Stack Overflow | Text (Q&A, code) | Questions, answers, comments | Tree-based (nested Q&A) | Nested threads (answers/comments) | Deep (threaded Q&A discussions) | Technical Q&A |
Hacker News | Text (links, articles) | Posts, comments | Tree-based (nested comments) | Nested threads (hierarchical) | Deep (nested comment discussions) | Tech-centric discussions and news |
Clubhouse | Live voice conversations | Live sessions, conversations | Flat live sessions | No traditional threading (live) | Deep (live, verbal interaction) | Real-time conversations |
Tumblr | Media, text | Posts, reblogs, comments | Flat posts with linked chains | Limited thread-like (reblogs) | Moderate (notes, reblogs) | Blogging + creative sharing |
Images (Pins) | Pins, comments | Flat posts (Pins) | No threading (optional comments) | Shallow (limited interactions) | Visual inspiration, content curation | |
Mastodon | Short text, media | Toots, replies, boosts | Flat posts with links | Limited flat threading (chains) | Shallow (quick replies, boosts) | Decentralized microblogging |
- Concurrency and Parallelism Considerations: Managing concurrent interactions with posts is essential for ensuring data integrity in realms with high activity. Implementing concurrency control mechanisms, such as locks or transactions, would prevent conflicts during simultaneous modifications or insertions, while maintaining performance and minimizing gas costs.
As I understand it, a transaction is processed as a single thread, and multiple transactions are processed in sequence. (This is necessary for determinism.) I don't think gno.land has an issue with "simultaneous modifications" or need for locks. Is that right?
Context:
Tracking issue for specification efforts for boards v2.
Related Notes & Links:
/r/demo/boards/
v2