Boards v2 Specs - Githubissues

Context:

Tracking issue for specification efforts for boards v2.

Related Notes & Links:

[23-09-2024]:

Initial Thoughts for Specs

[!NOTE] TL;DR:

Posts are being considered as the core primitive (one generic type).

Boards are treated as a type of post with no parent to simplify the architecture.

The system can use a slug hierarchy for efficient post organization and querying, allowing prefix-based searches to quickly identify relationships between posts and replies.

AVL trees are implemented as the core data structure, ensuring balanced and efficient storage. However, concerns about title-based searches suggest supplementary structures might be necessary for non-hierarchical queries.

While AVL trees perform well for insertions and lookups, managing everything in a single tree could lead to inefficiencies for complex queries. Multiple AVL trees or alternative structures may be used for different types of data (posts, comments).

We will proceed with constructing the AVL tree based on the current Gno implementation and will assess its performance iteratively.

Context:

Discussions centered on how to structure boards and posts in a unified model, with posts as the base type for both. Treating boards as a type of post simplifies the architecture by allowing them to share common fields, such as slugs. Boards are differentiated by their parent-child relationships, where a board is simply a post without a parent.

Posts as a Universal Structure:

In this model, everything is treated as a post, including boards. Boards do not require a separate structure, as they are identified by their lack of a parent. Managing boards, posts, and replies uniformly reduces complexity and avoids the need for multiple types. Comments and sub-posts are children of a parent post, and each post can be rendered differently depending on its label or type.

[!NOTE] This unified structure reduces the overhead of maintaining multiple types and adds flexibility at the UI level by allowing different post behaviors.

Hierarchical Slug Design

The slug system is key to organizing the hierarchy. A slug like 0_blog could represent a board, while 0_blog:1_username1 represents the first post under that board. Additional nesting is handled by extending the slug (0_blog:1_username1:2_username2). This slug hierarchy allows efficient traversal and retrieval of posts and their replies using prefix-based queries. For instance, all replies to 0_blog:1_username1 can be fetched by querying slugs that start with that prefix.

Using prefix searches allows efficient iteration through all replies or sub-posts. This method eliminates the need for complex joins or additional structures for parent-child queries, leveraging the slug structure to quickly identify related posts.

// Example Slug System:
//     "0_blog"          > board
//     "0_blog:1_username1"  > post under blog board
//     "0_blog:1_username1r:2_username2"  > reply to username1's post

Iterating Through Posts Using Slugs

Example 1: Iteration Using Plain Slugs (courtesy of @jeronimoalbi)

When using plain slugs (no ID prefixes), we rely on string operations like HasPrefix to manage traversal and filtering. This approach works but has limitations, especially in knowing when to stop the iteration.

start := "blog"
tree.Iterate(start, "", func(slug string, _ interface{}) bool {
    if !strings.HasPrefix(slug, start) { 
        return true // Stop when slug no longer matches the prefix
    }
    println("  > " + slug) // Output: "blog", "blog/a", "blog/a/1", etc.
    return false
})

Without a clear end point, we manually check if each item's slug still matches the prefix, which can be inefficient.

Example 2: Iteration Using ID Prefixes (courtesy of @jeronimoalbi)

By adding ID prefixes to slugs (e.g., 1_blog), we can identify iteration boundaries more effectively. This allows for simpler and more efficient traversal without needing string operations like HasPrefix.

start := "1_" // ID is used as the start instead of slug
end := "2"   // End is expressed as string(ID+1)
tree.Iterate(start, end, func(slug string, _ interface{}) bool {
    println("  > " + slug) // Output: "1_blog", "1_blog/a", "1_blog/b", etc.
    return false
})

Using ID prefixes allows iteration to be more efficient, as it avoids the need for string matching checks at each step.

AVL Tree as the Core Data Structure

The AVL tree is proposed as the primary data structure for storing posts, keyed by the slug. This ensures that all posts are stored in a balanced and efficient manner.

Although AVL trees offer O(log n) complexity for insertions and searches, concerns were raised about their efficiency for title-based searches. Searching for a post by its title would require traversing the entire tree, which is inefficient for large datasets.

[!WARNING] Global searches, particularly for non-slug fields like titles, might introduce inefficiencies especially for large datasets.

Handling Global Search

Title-based searches across the AVL tree are inefficient. Supplementary structures or inverted indexes are considered for fast lookups by title or other non-hierarchical fields. The slug structure remains useful for hierarchical queries, while additional indexing will handle broader searches.

Differentiating Boards and Posts in Practice

Structurally, boards and posts are identical as posts. They can be distinguished by labels or type fields. For instance, boards can have the label "board", while posts might carry a different label. This distinction allows the UI to handle boards, threads, and comments differently, providing system behavior flexibility.

Iteration & Data Access

Efficient iteration is crucial, especially when accessing data chunks. All posts and sub-posts should be retrievable without multiple calls when iterating through a board. This is achievable by iterating over the slug hierarchy, using the prefix to access relevant posts quickly. AVL tree traversal allows for chunked access, ensuring that only relevant data is loaded, which optimizes performance.

Constraints of a Single AVL Tree

Managing everything in a single AVL tree can lead to inefficiencies. Handling different aspects of the hierarchy (boards, posts, comments) within a single tree might not scale well, especially for complex queries like global title searches. The use of multiple AVL trees—one for each level of the hierarchy—or separate trees for posts and comments is being considered to improve performance.

[!TIP] Multiple AVL trees could be necessary to optimize data handling, improving performance by distributing the load across different trees for specific types of queries.

Complexity Analysis:

[!WARNING] This assumes the implementation in gno.land and needs checking

Summary of Time Complexity

Operation	AVL Tree	With ID Prefixes	Global Search (Title)
Insertion	O(log n)	O(log n)	O(log n)
Search (Slug)	O(log n)	O(log n)	O(n)
Iteration	*O(n m)**	O(n)	O(n)

Summary of Space Complexity

Structure	Space Complexity
AVL Tree	O(n)
Slug Structure	O(n)

Context:

For basic operations like insertion and search, we use an AVL tree as our core data structure. AVL trees are designed to remain balanced, which means their height is always logarithmic in proportion to the number of nodes. This is why insertion and search take O(log n) time. Every time a new post is added or searched for, the AVL tree needs to rebalance itself to ensure efficient future operations.

When iterating through posts using plain slugs (without ID prefixes), the system needs to compare each post's slug with a target prefix (using something like HasPrefix). This adds a bit of overhead because it involves comparing strings, which can vary in length (denoted by m). So, if we're dealing with 1,000 posts and the slugs are 10 characters long on average, we're looking at O(n * m) time complexity.

The idea behind ID prefixes is to speed up this process. If every post’s slug has a unique numeric prefix (like 1_blog, 2_blog, etc.), the system can simply iterate through these posts in O(n) time, without needing to compare strings. It knows where to start and stop based on the IDs, making traversal faster.

Then, global searches, like looking for a post by its title, present more of a challenge with AVL trees. Since the tree is organized by slugs, not titles, you’d have to traverse the entire tree to check each title. This takes O(n) time. This inefficiency is why we mentioned possibly adding supplementary structures for fields like titles in the future. But, without them, a title search in an AVL tree is linear.

[26-09-2024]:

Originally shared internally

During the sync today, we spent some time catching up with Kirk and discussing previous topics.

Jerónimo has also shared additional sample work and notes:

Provides perspective on what type differentiation may look like using Post as the main type; potential structs for Board, Post, Poll, Choice, and Comment and ways to manage multiple post types within a single Post struct, considering options like interfaces, type fields, or inferring the type based on specific fields.
More depth in slug formats (e.g., blog/a/1, blog/b/2:xxxxx/dddd) to encode hierarchical relationships for easier data retrieval.
Suggests that each board maintain AVL trees to index posts by slug and tags.

We discussed whether posts should reference parent posts directly, and the impact on traversal and efficiency.

Another key challenge is naming and structuring the package, considering the unified "posts" structure, given its hierarchical nature. Simply calling it "posts" could be misleading and may not reflect its full functionality.

One approach to address package structure and naming could be to leverage the fact that the system organizes posts into AVL tree structures, defining relationships between boards, posts, and replies, effectively creating a new tree structure. Naming it something like PostTree, DiscussionTree, or ThreadTree would better reflect this architecture. The package would be organized to reflect this form.

Or multiple packages, so perhaps a package that defines the main posts type, and a "tree package" would use this to create boards. But generally it would drift away from the existing setup of boards v1 if adapting posts as the main type.

[Originally shared internally]:

Based on what Ilker shared with the help option that constructs a tx for function calls in the CLI—there could be something implemented in gno.land that resembles Actions and Blinks (in Solana), and used for /r/demo/boards/ V2.

Blinks could be shareable links or URLs that let users initiate on-chain actions, like voting, posting, or commenting on boards. Someone could click a link in a message, on a board, or even on other social media, and it would open their wallet to approve the action without needing to directly interact with the dApp.

Actions could then handle the automation of common tasks like posting or moderating (across one or more boards). Instead of crafting a transaction from scratch, the user could trigger pre-built transactions that execute these tasks, making interactions with boards more intuitive, especially for those less familiar with blockchain.

An example of this is the ?help option that Ilker mentioned, which can be added to any realm link to offer help commands (like this example). This could extend to triggering actions directly from realm links.

[30-09-2024]:

[!NOTE] TL;DR:

Posts remain the core primitive in the system, represented by a generic Post struct. Boards are treated as level 0 posts with no parent, simplifying the overall architecture. All other entities, such as threads and comments, are managed hierarchically using levels.

Extensive discussions on forking mechanisms, particularly focusing on Copy-On-Write (COW) to ensure efficiency.

To manage complexity and improve performance, multiple AVL trees may be used for different levels (e.g., boards, threads, comments) or content types.

Definition Proposal for Generic `Posts` (expanded):

// Post represents a post in a hierarchical structure.
// The Level field determines the post's position in the hierarchy, and the Base field tracks if it is a fork of another post.
type Post struct {
    ID        string        // Unique identifier for each post.
    Level     uint          // Represents the depth in the hierarchy (e.g., 0 for boards, 1 for threads, etc.).
    Children  []*Post       // Child posts, such as comments or replies.
    Base      *Post         // The post this post was forked from. If nil, the post is original.
    Forks     []*Post       // Stores references to posts that were forked from this post.
    Content   PostContent   // The main content of the post, stored via an interface to allow for different content types.
    CreatedAt time.Time     // Timestamp when the post was created.
    Creator   std.Address   // Address of the creator of the post.
}

// PostContent represents different types of content that a post can hold.
// Allows flexibility in the type of content associated with a post.
type PostContent interface {
    Render() string // Returns a rendered representation of the content (e.g., HTML/Markdown).
}

// TextPostContent represents simple text content.
// This type holds text data and is rendered as plain text.
type TextPostContent struct {
    Text string
}

// Render returns the raw text content.
// Used for displaying the content in a UI.
func (t TextPostContent) Render() string {
    return t.Text
}

// PollPostContent represents a poll within a post.
// Stores a question, options for voting, and tracks the results.
type PollPostContent struct {
    Question string     // The question being asked in the poll.
    Options  []string   // A list of possible options for voting.
    Results  PollResult // The results of the poll, tracking votes cast by users.
}

// PollResult tracks the votes in a poll.
// Maps each voter's address to their chosen option.
type PollResult struct {
    Votes map[std.Address]string
}

// Numbers calculates the total votes for each option in the poll.
// Iterates over the votes and counts how many times each option was selected.
func (r PollResult) Numbers() map[string]int {
    tally := make(map[string]int)
    for _, option := range r.Votes {
        tally[option]++
    }
    return tally
}

// AnyPostContent represents arbitrary or custom content in raw byte form.
// This type is used for content that doesn't fit predefined structures like text or polls.
type AnyPostContent []byte

// Render converts the raw byte content into a string representation.
// This allows raw content to be displayed as a string in the UI.
func (a AnyPostContent) Render() string {
    return string(a)
}

Forking Complexity Analysis:

[!WARNING] The results may vary depending on final implementation details and should be validated further.

Summary of Time Complexity:

Operation	Node Reference	Full Subtree Duplication	Copy-On-Write (COW)
Fork Creation	O(1) (reference only)	O(k) (k = size of subtree)	O(1) (reference only)
Fork Modification	O(log n)	O(log n)	O(log n)
Insertion (Post/Thread)	O(log n)	O(log n)	O(log n)
Search (Slug)	O(log n)	O(log n)	O(log n)
Iteration (All Posts)	O(n)	O(n)	O(n)
Iteration by Level (Prefix)	O(k + log n)	O(k + log n)	O(k + log n)
Global Search (Title)	O(n)	O(n)	O(n)

Summary of Space Complexity:

Operation	Node Reference	Full Subtree Duplication	Copy-On-Write (COW)
Fork Creation	O(1)	O(k)	O(1)
Fork Modification	O(log n)	O(log n)	O(n + k) (forked changes)
AVL Tree	O(n)	O(n + k)	O(n + m × log n)
Slug Structure	O(n)	O(n + k)	O(n + m × log n)

Context:

Three distinct strategies were explored, each presenting different trade-offs in terms of time and space complexity:

Node Reference: Fork creation in Node Reference only involves referencing the original post without duplicating any data, resulting in O(1) time complexity. No data is copied until a modification occurs, which triggers O(log n) complexity for updating the AVL tree and copying the necessary nodes. This method is most effective when frequent forks are created that remain largely unchanged.
Full Subtree Duplication: In this strategy, the entire subtree is copied during the fork, incurring O(k) time complexity, where k represents the size of the subtree. This ensures the forked post is independent from the original, but at a high initial cost. Post-fork modifications, however, follow standard AVL tree operations at O(log n). Full Subtree Duplication is best suited for forks that will undergo extensive independent modifications.
Copy-On-Write (COW): Initially, COW behaves similarly to Node Reference, with O(1) fork creation as no data is copied at first. When modifications are made, affected nodes are duplicated on demand, leading to O(log n) complexity for updates. While the initial cost is low, the complexity of handling deep changes mirrors that of Full Subtree Duplication. COW works well in scenarios where forks are frequent but modifications are rare or shallow.

For basic operations such as insertion and search using slugs, AVL trees maintain O(log n) complexity. However, iteration over all posts requires visiting each node, resulting in O(n) complexity. When iterating by hierarchical level (prefix-based), the complexity becomes O(k + log n), where k accounts for the size of the subtree at that level.

The use of multiple AVL trees, where each level (boards, threads, comments) is separated into its own tree, localizes operations. For example, inserting a comment does not affect the tree managing boards, keeping operations efficient within the appropriate scope. This approach reduces unnecessary traversal and allows for more targeted interactions within specific levels of the hierarchy.

Global searches, particularly those based on titles, are more challenging. Since the AVL trees planned are keyed by slugs, title-based searches require traversing the entire tree, leading to O(n) complexity. Without supplementary indexing structures for non-slug fields like titles, global searches become inefficient in large datasets.

Forking strategies that reduce upfront costs, like Node Reference and COW, shift the complexity to the modification phase. Full Subtree Duplication avoids this, ensuring immediate independence for the forked posts, though at a significant initial cost. Iteration complexity remains O(n) across all strategies, with efficiencies gained when operations are scoped to specific trees in the multiple AVL tree model.

Gas Considerations:

Node Reference:

Fork creation with Node Reference incurs minimal gas costs, as no data is copied and only a reference to the original post is stored.

When modifications are made, gas costs increase due to the need to copy affected nodes and update the AVL tree. The costs scale with the depth of the modification, as each change requires writing to storage.

This method works well for frequent forks with few expected modifications, keeping the initial gas costs low.

Full Subtree Duplication:

Fork creation using Full Subtree Duplication incurs higher gas costs because the entire subtree, including all nodes and children, is duplicated and written to storage.

Once the fork is created, further modifications have lower gas costs, as the forked data is independent and does not require further copying.

This approach is suitable when the forked data needs to be modified significantly and independently from the original.

Copy-On-Write (COW):

COW has low gas costs during fork creation, as no data is copied initially—only references to the original nodes are created.

When modifications are made, gas costs increase due to the need to duplicate the affected nodes and write the changes to storage. These costs scale with the depth of the modifications.

COW is useful when forks are frequent but are not expected to undergo extensive modifications.

[30-09-2024]:

Action Items:

@wyhaines will prototype different forking strategies to validate discussion points and theoretical complexity analysis.
@jeronimoalbi will refine the base structure for posts, ensuring the hierarchical model supports forking, iteration, and multiple content types (TextPostContent, PollPostContent, etc.).
@wyhaines and @jeronimoalbi will collaborate on implementing a multiple AVL tree model to test the efficiency of localized operations for boards, threads, and comments.
@wyhaines and @jeronimoalbi will optimize gas cost estimations based on forking strategy tests.

Additional Suggestions and Actions:

The following suggestions focus on expanding the architecture and enabling potential functionality for developers building and using a "board" as a realm. Some ideas may be better suited for future versions (e.g., "v3") but are outlined here for consideration:

Concurrency and Parallelism Considerations:
Managing concurrent interactions with posts is essential for ensuring data integrity in realms with high activity. Implementing concurrency control mechanisms, such as locks or transactions, would prevent conflicts during simultaneous modifications or insertions, while maintaining performance and minimizing gas costs.
Cache Mechanisms for Performance:
Introducing caching mechanisms could improve performance for frequently accessed posts or global searches. This would help reduce the computational load on the AVL trees, especially for high-traffic boards, enabling faster read operations.
Boards Referencing Each Other:
Boards could reference one another across different realms using the slugs or unique identifiers. This allows content from one board to be cited or linked within another, much like hyperlinks. This functionality would enable discussions to span multiple boards and realms.
Boards Interacting with Each Other:
Facilitating interaction between boards through inter-realm messaging can enable boards to dynamically pull in content from other boards. For example, a governance board might incorporate discussions from related realms without duplicating the data. This interaction would be governed by the rules of each board.
Pinning Posts Within Boards:
Pinning posts within boards could be implemented by adding metadata to designate specific posts as "pinned", or by a Post behavior (isPinned). Pinned posts would appear at the top of the board's display until unpinned.
Role-Based Access Control (e.g., Moderation):
Role-based access control (RBAC) logic could be introduced to define roles such as admin, moderator, or user within a realm. RBAC would enforce permissions based on roles, determining who can post, comment, or moderate. Additional access controls, such as allowlists or private boards, could be implemented to restrict access to specific users or groups.

[02-02-2024]:

We will use a feature branch on the gno core repo to leverage existing CI.

How to Name?

Common Platform Concepts:

Flat Posts: Posts are standalone entities, with replies displayed in a single layer, without deep nesting.
Tree-based Structure (Hierarchical): Content is organized in a parent-child relationship, where posts have nested comments and replies.
Threading:
- Deep Threading: Multi-level replies that support detailed, branched discussions.
- Shallow Threading: Limited reply depth, usually one or two levels deep.
- Flat Threading: Replies are listed sequentially with no nesting.
Shallow Threading: Allows for one or two levels of replies, with comments and replies directly attached to posts or comments.
Deep Threading: Supports multiple levels of replies nested under each previous reply.
User-driven Discovery: Users actively search, subscribe, or navigate predefined categories or communities to find content.
Algorithmic Discovery: Content is automatically recommended based on user behavior, preferences, or engagement metrics.
Live Sessions: Real-time interactions, often audio or video-based, that lack persistent comment chains or threading.
Content Types:
- Posts: General content such as text, images, or links.
- Tweets: Short text posts with optional media attachments.
- Pins: Visual bookmarks saved and organized by users.
- Toots: Short-form posts used in decentralized platforms like Mastodon.

Overview of Platform Content Structure:

Here is an overview of common platforms and their respective content structures and interaction models:

Platform	Primary Content Format	Content Types	Data Structure	Threading Model	Interaction Depth	Primary Use Case
Reddit	Text, links, images, video	Posts, comments, replies	Tree-based (nested posts)	Deeply nested (hierarchical)	Deep (multi-level conversations)	Community-driven discussions
X/Twitter	Short text, media	Tweets, retweets, replies	Flat posts with links	Limited flat threading (chains)	Shallow (quick replies, retweets)	Broadcast + real-time commentary
Facebook	Mixed media	Posts, comments, reactions	Flat posts, shallow threads	Limited shallow threading (comments)	Moderate (comment chains)	Social networking + sharing
Discord	Text, voice, media	Messages, replies, threads	Flat posts in channels	Shallow threading (replies within channels)	Deep (live, threaded interactions)	Real-time communication (text/voice)
Quora	Text (Q&A)	Questions, answers, comments	Tree-based (nested Q&A)	Nested threads (answers/comments)	Deep (threaded question responses)	Knowledge-sharing, Q&A
Instagram	Media (images, video)	Posts, comments	Flat posts	Shallow threading (flat comments)	Shallow (quick comment replies)	Visual content sharing
Stack Overflow	Text (Q&A, code)	Questions, answers, comments	Tree-based (nested Q&A)	Nested threads (answers/comments)	Deep (threaded Q&A discussions)	Technical Q&A
Hacker News	Text (links, articles)	Posts, comments	Tree-based (nested comments)	Nested threads (hierarchical)	Deep (nested comment discussions)	Tech-centric discussions and news
Clubhouse	Live voice conversations	Live sessions, conversations	Flat live sessions	No traditional threading (live)	Deep (live, verbal interaction)	Real-time conversations
Tumblr	Media, text	Posts, reblogs, comments	Flat posts with linked chains	Limited thread-like (reblogs)	Moderate (notes, reblogs)	Blogging + creative sharing
Pinterest	Images (Pins)	Pins, comments	Flat posts (Pins)	No threading (optional comments)	Shallow (limited interactions)	Visual inspiration, content curation
Mastodon	Short text, media	Toots, replies, boosts	Flat posts with links	Limited flat threading (chains)	Shallow (quick replies, boosts)	Decentralized microblogging

Concurrency and Parallelism Considerations: Managing concurrent interactions with posts is essential for ensuring data integrity in realms with high activity. Implementing concurrency control mechanisms, such as locks or transactions, would prevent conflicts during simultaneous modifications or insertions, while maintaining performance and minimizing gas costs.

As I understand it, a transaction is processed as a single thread, and multiple transactions are processed in sequence. (This is necessary for determinism.) I don't think gno.land has an issue with "simultaneous modifications" or need for locks. Is that right?

gnostudio / gno

Boards v2 Specs #1

Context:

Related Notes & Links:

Initial Thoughts for Specs

Context:

Posts as a Universal Structure:

Hierarchical Slug Design

Iterating Through Posts Using Slugs

Example 1: Iteration Using Plain Slugs (courtesy of @jeronimoalbi)

Example 2: Iteration Using ID Prefixes (courtesy of @jeronimoalbi)

AVL Tree as the Core Data Structure

Handling Global Search

Differentiating Boards and Posts in Practice

Iteration & Data Access

Constraints of a Single AVL Tree

Complexity Analysis:

Summary of Time Complexity

Summary of Space Complexity

Context:

Definition Proposal for Generic `Posts` (expanded):

Forking Complexity Analysis:

Summary of Time Complexity:

Summary of Space Complexity:

Context:

Gas Considerations:

Node Reference:

Full Subtree Duplication:

Copy-On-Write (COW):

Additional Suggestions and Actions:

How to Name?

Common Platform Concepts:

Overview of Platform Content Structure:

gnostudio / gno

Boards v2 Specs #1

Context:

Related Notes & Links:

Initial Thoughts for Specs

Context:

Posts as a Universal Structure:

Hierarchical Slug Design

Iterating Through Posts Using Slugs

Example 1: Iteration Using Plain Slugs (courtesy of @jeronimoalbi)

Example 2: Iteration Using ID Prefixes (courtesy of @jeronimoalbi)

AVL Tree as the Core Data Structure

Handling Global Search

Differentiating Boards and Posts in Practice

Iteration & Data Access

Constraints of a Single AVL Tree

Complexity Analysis:

Summary of Time Complexity

Summary of Space Complexity

Context:

Definition Proposal for Generic Posts (expanded):

Forking Complexity Analysis:

Summary of Time Complexity:

Summary of Space Complexity:

Context:

Gas Considerations:

Node Reference:

Full Subtree Duplication:

Copy-On-Write (COW):

Additional Suggestions and Actions:

How to Name?

Common Platform Concepts:

Overview of Platform Content Structure:

Definition Proposal for Generic `Posts` (expanded):