gnostudio / gno

Gno: An interpreted, stack-based Go virtual machine to build succinct and composable apps + Gno.land: a blockchain for timeless code and fair open-source
https://gno.land/
Other
0 stars 0 forks source link

Boards v2 Specs #1

Open salmad3 opened 1 week ago

salmad3 commented 1 week ago

Context:

Tracking issue for specification efforts for boards v2.

Related Notes & Links:

salmad3 commented 1 week ago

[23-09-2024]:

Initial Thoughts for Specs

[!NOTE] TL;DR:

  • Posts are being considered as the core primitive (one generic type).
  • Boards are treated as a type of post with no parent to simplify the architecture.
  • The system can use a slug hierarchy for efficient post organization and querying, allowing prefix-based searches to quickly identify relationships between posts and replies.
  • AVL trees are implemented as the core data structure, ensuring balanced and efficient storage. However, concerns about title-based searches suggest supplementary structures might be necessary for non-hierarchical queries.
  • While AVL trees perform well for insertions and lookups, managing everything in a single tree could lead to inefficiencies for complex queries. Multiple AVL trees or alternative structures may be used for different types of data (posts, comments).
  • We will proceed with constructing the AVL tree based on the current Gno implementation and will assess its performance iteratively.

Context:

Discussions centered on how to structure boards and posts in a unified model, with posts as the base type for both. Treating boards as a type of post simplifies the architecture by allowing them to share common fields, such as slugs. Boards are differentiated by their parent-child relationships, where a board is simply a post without a parent.

Posts as a Universal Structure:

In this model, everything is treated as a post, including boards. Boards do not require a separate structure, as they are identified by their lack of a parent. Managing boards, posts, and replies uniformly reduces complexity and avoids the need for multiple types. Comments and sub-posts are children of a parent post, and each post can be rendered differently depending on its label or type.

[!NOTE] This unified structure reduces the overhead of maintaining multiple types and adds flexibility at the UI level by allowing different post behaviors.

Hierarchical Slug Design

The slug system is key to organizing the hierarchy. A slug like 0_blog could represent a board, while 0_blog:1_username1 represents the first post under that board. Additional nesting is handled by extending the slug (0_blog:1_username1:2_username2). This slug hierarchy allows efficient traversal and retrieval of posts and their replies using prefix-based queries. For instance, all replies to 0_blog:1_username1 can be fetched by querying slugs that start with that prefix.

Using prefix searches allows efficient iteration through all replies or sub-posts. This method eliminates the need for complex joins or additional structures for parent-child queries, leveraging the slug structure to quickly identify related posts.

// Example Slug System:
//     "0_blog"          > board
//     "0_blog:1_username1"  > post under blog board
//     "0_blog:1_username1r:2_username2"  > reply to username1's post

Iterating Through Posts Using Slugs

Example 1: Iteration Using Plain Slugs (courtesy of @jeronimoalbi)

When using plain slugs (no ID prefixes), we rely on string operations like HasPrefix to manage traversal and filtering. This approach works but has limitations, especially in knowing when to stop the iteration.

start := "blog"
tree.Iterate(start, "", func(slug string, _ interface{}) bool {
    if !strings.HasPrefix(slug, start) { 
        return true // Stop when slug no longer matches the prefix
    }
    println("  > " + slug) // Output: "blog", "blog/a", "blog/a/1", etc.
    return false
})

Without a clear end point, we manually check if each item's slug still matches the prefix, which can be inefficient.

Example 2: Iteration Using ID Prefixes (courtesy of @jeronimoalbi)

By adding ID prefixes to slugs (e.g., 1_blog), we can identify iteration boundaries more effectively. This allows for simpler and more efficient traversal without needing string operations like HasPrefix.

start := "1_" // ID is used as the start instead of slug
end := "2"   // End is expressed as string(ID+1)
tree.Iterate(start, end, func(slug string, _ interface{}) bool {
    println("  > " + slug) // Output: "1_blog", "1_blog/a", "1_blog/b", etc.
    return false
})

Using ID prefixes allows iteration to be more efficient, as it avoids the need for string matching checks at each step.

AVL Tree as the Core Data Structure

The AVL tree is proposed as the primary data structure for storing posts, keyed by the slug. This ensures that all posts are stored in a balanced and efficient manner.

Although AVL trees offer O(log n) complexity for insertions and searches, concerns were raised about their efficiency for title-based searches. Searching for a post by its title would require traversing the entire tree, which is inefficient for large datasets.

[!WARNING] Global searches, particularly for non-slug fields like titles, might introduce inefficiencies especially for large datasets.

Handling Global Search

Title-based searches across the AVL tree are inefficient. Supplementary structures or inverted indexes are considered for fast lookups by title or other non-hierarchical fields. The slug structure remains useful for hierarchical queries, while additional indexing will handle broader searches.

Differentiating Boards and Posts in Practice

Structurally, boards and posts are identical as posts. They can be distinguished by labels or type fields. For instance, boards can have the label "board", while posts might carry a different label. This distinction allows the UI to handle boards, threads, and comments differently, providing system behavior flexibility.

Iteration & Data Access

Efficient iteration is crucial, especially when accessing data chunks. All posts and sub-posts should be retrievable without multiple calls when iterating through a board. This is achievable by iterating over the slug hierarchy, using the prefix to access relevant posts quickly. AVL tree traversal allows for chunked access, ensuring that only relevant data is loaded, which optimizes performance.

Constraints of a Single AVL Tree

Managing everything in a single AVL tree can lead to inefficiencies. Handling different aspects of the hierarchy (boards, posts, comments) within a single tree might not scale well, especially for complex queries like global title searches. The use of multiple AVL trees—one for each level of the hierarchy—or separate trees for posts and comments is being considered to improve performance.

[!TIP] Multiple AVL trees could be necessary to optimize data handling, improving performance by distributing the load across different trees for specific types of queries.


Complexity Analysis:

[!WARNING] This assumes the implementation in gno.land and needs checking

Summary of Time Complexity

Operation AVL Tree With ID Prefixes Global Search (Title)
Insertion O(log n) O(log n) O(log n)
Search (Slug) O(log n) O(log n) O(n)
Iteration O(n * m) O(n) O(n)

Summary of Space Complexity

Structure Space Complexity
AVL Tree O(n)
Slug Structure O(n)

Context:

For basic operations like insertion and search, we use an AVL tree as our core data structure. AVL trees are designed to remain balanced, which means their height is always logarithmic in proportion to the number of nodes. This is why insertion and search take O(log n) time. Every time a new post is added or searched for, the AVL tree needs to rebalance itself to ensure efficient future operations.

When iterating through posts using plain slugs (without ID prefixes), the system needs to compare each post's slug with a target prefix (using something like HasPrefix). This adds a bit of overhead because it involves comparing strings, which can vary in length (denoted by m). So, if we're dealing with 1,000 posts and the slugs are 10 characters long on average, we're looking at O(n * m) time complexity.

The idea behind ID prefixes is to speed up this process. If every post’s slug has a unique numeric prefix (like 1_blog, 2_blog, etc.), the system can simply iterate through these posts in O(n) time, without needing to compare strings. It knows where to start and stop based on the IDs, making traversal faster.

Then, global searches, like looking for a post by its title, present more of a challenge with AVL trees. Since the tree is organized by slugs, not titles, you’d have to traverse the entire tree to check each title. This takes O(n) time. This inefficiency is why we mentioned possibly adding supplementary structures for fields like titles in the future. But, without them, a title search in an AVL tree is linear.

salmad3 commented 5 days ago

[26-09-2024]:

Originally shared internally

During the sync today, we spent some time catching up with Kirk and discussing previous topics.

Jerónimo has also shared additional sample work and notes:

We discussed whether posts should reference parent posts directly, and the impact on traversal and efficiency.

Another key challenge is naming and structuring the package, considering the unified "posts" structure, given its hierarchical nature. Simply calling it "posts" could be misleading and may not reflect its full functionality.

One approach to address package structure and naming could be to leverage the fact that the system organizes posts into AVL tree structures, defining relationships between boards, posts, and replies, effectively creating a new tree structure. Naming it something like PostTree, DiscussionTree, or ThreadTree would better reflect this architecture. The package would be organized to reflect this form.

Or multiple packages, so perhaps a package that defines the main posts type, and a "tree package" would use this to create boards. But generally it would drift away from the existing setup of boards v1 if adapting posts as the main type.

salmad3 commented 5 days ago

[Originally shared internally]:

Based on what Ilker shared with the help option that constructs a tx for function calls in the CLI—there could be something implemented in gno.land that resembles Actions and Blinks (in Solana), and used for /r/demo/boards/ V2.

Blinks could be shareable links or URLs that let users initiate on-chain actions, like voting, posting, or commenting on boards. Someone could click a link in a message, on a board, or even on other social media, and it would open their wallet to approve the action without needing to directly interact with the dApp.

Actions could then handle the automation of common tasks like posting or moderating (across one or more boards). Instead of crafting a transaction from scratch, the user could trigger pre-built transactions that execute these tasks, making interactions with boards more intuitive, especially for those less familiar with blockchain.

An example of this is the ?help option that Ilker mentioned, which can be added to any realm link to offer help commands (like this example). This could extend to triggering actions directly from realm links.

salmad3 commented 3 days ago

[30-09-2024]:

[!NOTE] TL;DR:

  • Posts remain the core primitive in the system, represented by a generic Post struct. Boards are treated as level 0 posts with no parent, simplifying the overall architecture. All other entities, such as threads and comments, are managed hierarchically using levels.
  • Extensive discussions on forking mechanisms, particularly focusing on Copy-On-Write (COW) to ensure efficiency.
  • To manage complexity and improve performance, multiple AVL trees may be used for different levels (e.g., boards, threads, comments) or content types.

Definition Proposal for Generic Posts (expanded):

// Post represents a post in a hierarchical structure.
// The Level field determines the post's position in the hierarchy, and the Base field tracks if it is a fork of another post.
type Post struct {
    ID        string        // Unique identifier for each post.
    Level     uint          // Represents the depth in the hierarchy (e.g., 0 for boards, 1 for threads, etc.).
    Children  []*Post       // Child posts, such as comments or replies.
    Base      *Post         // The post this post was forked from. If nil, the post is original.
    Forks     []*Post       // Stores references to posts that were forked from this post.
    Content   PostContent   // The main content of the post, stored via an interface to allow for different content types.
    CreatedAt time.Time     // Timestamp when the post was created.
    Creator   std.Address   // Address of the creator of the post.
}

// PostContent represents different types of content that a post can hold.
// Allows flexibility in the type of content associated with a post.
type PostContent interface {
    Render() string // Returns a rendered representation of the content (e.g., HTML/Markdown).
}

// TextPostContent represents simple text content.
// This type holds text data and is rendered as plain text.
type TextPostContent struct {
    Text string
}

// Render returns the raw text content.
// Used for displaying the content in a UI.
func (t TextPostContent) Render() string {
    return t.Text
}

// PollPostContent represents a poll within a post.
// Stores a question, options for voting, and tracks the results.
type PollPostContent struct {
    Question string     // The question being asked in the poll.
    Options  []string   // A list of possible options for voting.
    Results  PollResult // The results of the poll, tracking votes cast by users.
}

// PollResult tracks the votes in a poll.
// Maps each voter's address to their chosen option.
type PollResult struct {
    Votes map[std.Address]string
}

// Numbers calculates the total votes for each option in the poll.
// Iterates over the votes and counts how many times each option was selected.
func (r PollResult) Numbers() map[string]int {
    tally := make(map[string]int)
    for _, option := range r.Votes {
        tally[option]++
    }
    return tally
}

// AnyPostContent represents arbitrary or custom content in raw byte form.
// This type is used for content that doesn't fit predefined structures like text or polls.
type AnyPostContent []byte

// Render converts the raw byte content into a string representation.
// This allows raw content to be displayed as a string in the UI.
func (a AnyPostContent) Render() string {
    return string(a)
}

Forking Complexity Analysis:

[!WARNING] The results may vary depending on final implementation details and should be validated further.

Summary of Time Complexity:

Operation Node Reference Full Subtree Duplication Copy-On-Write (COW)
Fork Creation O(1) (reference only) O(k) (k = size of subtree) O(1) (reference only)
Fork Modification O(log n) O(log n) O(log n)
Insertion (Post/Thread) O(log n) O(log n) O(log n)
Search (Slug) O(log n) O(log n) O(log n)
Iteration (All Posts) O(n) O(n) O(n)
Iteration by Level (Prefix) O(k + log n) O(k + log n) O(k + log n)
Global Search (Title) O(n) O(n) O(n)

Summary of Space Complexity:

Operation Node Reference Full Subtree Duplication Copy-On-Write (COW)
Fork Creation O(1) O(k) O(1)
Fork Modification O(log n) O(log n) O(n + k) (forked changes)
AVL Tree O(n) O(n + k) O(n + m × log n)
Slug Structure O(n) O(n + k) O(n + m × log n)

Context:

Three distinct strategies were explored, each presenting different trade-offs in terms of time and space complexity:

  1. Node Reference: Fork creation in Node Reference only involves referencing the original post without duplicating any data, resulting in O(1) time complexity. No data is copied until a modification occurs, which triggers O(log n) complexity for updating the AVL tree and copying the necessary nodes. This method is most effective when frequent forks are created that remain largely unchanged.

  2. Full Subtree Duplication: In this strategy, the entire subtree is copied during the fork, incurring O(k) time complexity, where k represents the size of the subtree. This ensures the forked post is independent from the original, but at a high initial cost. Post-fork modifications, however, follow standard AVL tree operations at O(log n). Full Subtree Duplication is best suited for forks that will undergo extensive independent modifications.

  3. Copy-On-Write (COW): Initially, COW behaves similarly to Node Reference, with O(1) fork creation as no data is copied at first. When modifications are made, affected nodes are duplicated on demand, leading to O(log n) complexity for updates. While the initial cost is low, the complexity of handling deep changes mirrors that of Full Subtree Duplication. COW works well in scenarios where forks are frequent but modifications are rare or shallow.

For basic operations such as insertion and search using slugs, AVL trees maintain O(log n) complexity. However, iteration over all posts requires visiting each node, resulting in O(n) complexity. When iterating by hierarchical level (prefix-based), the complexity becomes O(k + log n), where k accounts for the size of the subtree at that level.

The use of multiple AVL trees, where each level (boards, threads, comments) is separated into its own tree, localizes operations. For example, inserting a comment does not affect the tree managing boards, keeping operations efficient within the appropriate scope. This approach reduces unnecessary traversal and allows for more targeted interactions within specific levels of the hierarchy.

Global searches, particularly those based on titles, are more challenging. Since the AVL trees planned are keyed by slugs, title-based searches require traversing the entire tree, leading to O(n) complexity. Without supplementary indexing structures for non-slug fields like titles, global searches become inefficient in large datasets.

Forking strategies that reduce upfront costs, like Node Reference and COW, shift the complexity to the modification phase. Full Subtree Duplication avoids this, ensuring immediate independence for the forked posts, though at a significant initial cost. Iteration complexity remains O(n) across all strategies, with efficiencies gained when operations are scoped to specific trees in the multiple AVL tree model.

Gas Considerations:

Node Reference:

Fork creation with Node Reference incurs minimal gas costs, as no data is copied and only a reference to the original post is stored.

When modifications are made, gas costs increase due to the need to copy affected nodes and update the AVL tree. The costs scale with the depth of the modification, as each change requires writing to storage.

This method works well for frequent forks with few expected modifications, keeping the initial gas costs low.

Full Subtree Duplication:

Fork creation using Full Subtree Duplication incurs higher gas costs because the entire subtree, including all nodes and children, is duplicated and written to storage.

Once the fork is created, further modifications have lower gas costs, as the forked data is independent and does not require further copying.

This approach is suitable when the forked data needs to be modified significantly and independently from the original.

Copy-On-Write (COW):

COW has low gas costs during fork creation, as no data is copied initially—only references to the original nodes are created.

When modifications are made, gas costs increase due to the need to duplicate the affected nodes and write the changes to storage. These costs scale with the depth of the modifications.

COW is useful when forks are frequent but are not expected to undergo extensive modifications.

salmad3 commented 3 days ago

[30-09-2024]:

Action Items:

salmad3 commented 3 days ago

Additional Suggestions and Actions:

The following suggestions focus on expanding the architecture and enabling potential functionality for developers building and using a "board" as a realm. Some ideas may be better suited for future versions (e.g., "v3") but are outlined here for consideration:

salmad3 commented 20 hours ago

[02-02-2024]:

We will use a feature branch on the gno core repo to leverage existing CI.

salmad3 commented 20 hours ago

How to Name?

Common Platform Concepts:

Overview of Platform Content Structure:

Here is an overview of common platforms and their respective content structures and interaction models:

Platform Primary Content Format Content Types Data Structure Threading Model Interaction Depth Primary Use Case
Reddit Text, links, images, video Posts, comments, replies Tree-based (nested posts) Deeply nested (hierarchical) Deep (multi-level conversations) Community-driven discussions
X/Twitter Short text, media Tweets, retweets, replies Flat posts with links Limited flat threading (chains) Shallow (quick replies, retweets) Broadcast + real-time commentary
Facebook Mixed media Posts, comments, reactions Flat posts, shallow threads Limited shallow threading (comments) Moderate (comment chains) Social networking + sharing
Discord Text, voice, media Messages, replies, threads Flat posts in channels Shallow threading (replies within channels) Deep (live, threaded interactions) Real-time communication (text/voice)
Quora Text (Q&A) Questions, answers, comments Tree-based (nested Q&A) Nested threads (answers/comments) Deep (threaded question responses) Knowledge-sharing, Q&A
Instagram Media (images, video) Posts, comments Flat posts Shallow threading (flat comments) Shallow (quick comment replies) Visual content sharing
Stack Overflow Text (Q&A, code) Questions, answers, comments Tree-based (nested Q&A) Nested threads (answers/comments) Deep (threaded Q&A discussions) Technical Q&A
Hacker News Text (links, articles) Posts, comments Tree-based (nested comments) Nested threads (hierarchical) Deep (nested comment discussions) Tech-centric discussions and news
Clubhouse Live voice conversations Live sessions, conversations Flat live sessions No traditional threading (live) Deep (live, verbal interaction) Real-time conversations
Tumblr Media, text Posts, reblogs, comments Flat posts with linked chains Limited thread-like (reblogs) Moderate (notes, reblogs) Blogging + creative sharing
Pinterest Images (Pins) Pins, comments Flat posts (Pins) No threading (optional comments) Shallow (limited interactions) Visual inspiration, content curation
Mastodon Short text, media Toots, replies, boosts Flat posts with links Limited flat threading (chains) Shallow (quick replies, boosts) Decentralized microblogging
jefft0 commented 1 hour ago
  • Concurrency and Parallelism Considerations: Managing concurrent interactions with posts is essential for ensuring data integrity in realms with high activity. Implementing concurrency control mechanisms, such as locks or transactions, would prevent conflicts during simultaneous modifications or insertions, while maintaining performance and minimizing gas costs.

As I understand it, a transaction is processed as a single thread, and multiple transactions are processed in sequence. (This is necessary for determinism.) I don't think gno.land has an issue with "simultaneous modifications" or need for locks. Is that right?