dao-xyz / peerbit-examples

Example library for learning and fun
Apache License 2.0
6 stars 3 forks source link

Syncing reliability and expectations #7

Open Azaeres opened 9 months ago

Azaeres commented 9 months ago

Over at https://lab.etherion.app/experiment4 I've adapted your "many-chat-rooms" P2P chat app to my framework. Apart from changes to JSX sections and fixes to various TypeScript errors, the code is as close to the example as I could get.

All the files are located at https://github.com/Azaeres/etherion-lab/tree/main/src/components/scenes/Experiment4

After poking at it for some time, I noticed there are times where the database isn't consistent. For example, occasionally a room will be missing from the list. Messages are much more inconsistent; sometimes some will populate the room, many times none will.

I haven't really gotten a bead on what's causing the inconsistency, how to interpret what I'm seeing, or what kind of reliability to expect. How can we tell if these issues arise from poor network conditions as opposed to buggy code?

Also, is there some kind of persistence layer being used by default? Sometimes data will come through that I had figured shouldn't still be around.

As I'm trying to build a physics-based space shooter game using Peerbit to sync players together, there will be a need for fast, semi-reliable updates. How good is Peerbit at providing this under a range of conditions?

Basically, I've got some gaps in my understanding, and I'm not sure how to fill them, so I thought I'd ask for some help. Any advice?

marcus-pousette commented 9 months ago

Hey!

I am trying to see first if this issues are only in your repo only or are also present in the peerbit examples project. But starting from your repo. Here are two things I did to analyse and stategies for improving:

import { field, variant } from '@dao-xyz/borsh'
import { Program } from '@peerbit/program'
import { Documents, PutOperation, DeleteOperation, Role } from '@peerbit/document'
import { v4 as uuid } from 'uuid'
import { PublicSignKey, sha256Sync } from '@peerbit/crypto'
import { randomBytes } from '@peerbit/crypto'
import { SyncFilter } from '@peerbit/shared-log'
import { concat } from 'uint8arrays'

@variant(0) // for versioning purposes, we can do @variant(1) when we create a new post type version
export class Post {
  @field({ type: 'string' })
  id: string

  @field({ type: PublicSignKey })
  from: PublicSignKey

  @field({ type: 'string' })
  message: string

  constructor(properties: { from: PublicSignKey; message: string }) {
    this.id = uuid()
    this.from = properties.from
    this.message = properties.message
  }
}
type Args = { role?: Role; sync?: SyncFilter }

@variant('room')
export class RoomDB extends Program<Args> {
  @field({ type: 'string' })
  name: string

  @field({ type: Documents })
  messages: Documents<Post>

  constructor(properties: { name: string; messages?: Documents<Post> }) {
    super()
    this.name = properties.name
    this.messages =
      properties.messages ||
      new Documents({
        id: sha256Sync(
          concat([new TextEncoder().encode('room'), new TextEncoder().encode(this.name)])
        ),
      })
  }

  get id() {
    return this.name
  }

  // Setup lifecycle, will be invoked on 'open'
  async open(args?: Args): Promise<void> {
    await this.messages.open({
      type: Post,
      canPerform: async (operation, context) => {
        if (operation instanceof PutOperation) {
          const post = operation.value
          if (!context.entry.signatures.find((x) => x.publicKey.equals(post!.from))) {
            return false
          }
          return true
        } else if (operation instanceof DeleteOperation) {
          const get = await this.messages.index.get(operation.key)
          if (!get || !context.entry.signatures.find((x) => x.publicKey.equals(get.from))) {
            return false
          }
          return true
        }
        return false
      },
      replicas: {
        min: 0xffffffff, // max u32 (make everyone a replicator, disable sharding)
      },
      index: {
        canRead: async () => {
          // identity) => {
          return true // Anyone can query
        },
      },
      role: args?.role,
      sync: args?.sync,
    })
  }
}

@variant('lobby')
export class LobbyDB extends Program<Args> {
  @field({ type: Uint8Array })
  id: Uint8Array

  @field({ type: Documents })
  rooms: Documents<RoomDB>

  constructor(properties: { id?: Uint8Array }) {
    super()
    this.id = properties.id || randomBytes(32)
    this.rooms = new Documents<RoomDB>({ id: this.id })
  }

  // Setup lifecycle, will be invoked on 'open'
  async open(args?: Args): Promise<void> {
    await this.rooms.open({
      type: RoomDB,

      canPerform: () => {
        //entry) => {
        return Promise.resolve(true) // Anyone can create rooms
      },

      replicas: {
        min: 0xffffffff, // max u32 (make everyone a replicator, disable sharding)
      },
      index: {
        key: 'name',

        canRead: () => {
          // post, publicKey) => {
          return Promise.resolve(true) // Anyone can search for rooms
        },
      },
      canOpen: () => {
        // program) => {
        // Control whether someone can create a "room", which itself is a program with replication
        // Even if anyone could do "rooms.put(new Room())", that new entry has to be analyzed. And if it turns out that new entry represents a program
        // this means it should be handled in a special way (replication etc). This extra functionality needs requires peers to consider this additional security
        // boundary
        return Promise.resolve(true)
      },
      role: args?.role,
      sync: args?.sync,
    })
  }
}
marcus-pousette commented 9 months ago

As I'm trying to build a physics-based space shooter game using Peerbit to sync players together, there will be a need for fast, semi-reliable updates. How good is Peerbit at providing this under a range of conditions?

This should just work setting { replicas : { min: 10000 } } would make all changes, like movement of ships and actions to propagate to all peers. And then all peers should open the document store as a replicator and use the documents.events.addEventlistener('change', () => { to render changes })

marcus-pousette commented 9 months ago

you can put anti cheating inside canPerform hook đŸ˜†

Azaeres commented 9 months ago

Alright, using your recommendation of setting min replicas to a high value has gotten me to a point where I tend to get what I expect, I think. :+1:

Can you explain why the lobby seems to always have the complete set of rooms, and yet there are often no messages in a room when the user enters? They don't appear different at the level of the database classes. What am I missing? And at other times there are messages populating a room when the user enters. What explains this inconsistency?

Also, I was considering, as an exercise, to add a human-readable datetime-stamp to each message. I might need to add a field to the Post class, but syncing them across peers is not so trivial. A datetime on one device may be incorrect on another, for several reasons. What's the best way to think about this, and how might I go about displaying a timestamp on each message for each peer?

marcus-pousette commented 9 months ago

Can you explain why the lobby seems to always have the complete set of rooms, and yet there are often no messages in a room when the user enters? They don't appear different at the level of the database classes. What am I missing? And at other times there are messages populating a room when the user enters. What explains this inconsistency?

They are part of different databases and have different databases/programs. See the code I in my last message. We have

RoomDB extends Program

and

LobbyDB extends Program

though LobbyDB has this line

@field({ type: Documents }) rooms: Documents<RoomDB>

this does not necessarely mean that if you have documents/subdatabases inside this store, you are actually opening and replicating them. `

Everyone who joins the lobby will replicate all rooms. This does not necessarely mean everone will open every single room they see.

It is only when you go into a room you actually start to replicates messages you see there.

For example in the Room React component (which is only running when you are inside a room). It is only at this point you are actually opening the Room and start to replicate its messages

https://github.com/dao-xyz/peerbit-examples/blob/1256ffbc0d407e1a2904f3ddebc5078bbe580649/packages/many-chat-rooms/frontend/src/Room.tsx#L101

marcus-pousette commented 9 months ago

Though in the example above there is a line

 canOpen: () => {
        // program) => {
        // Control whether someone can create a "room", which itself is a program with replication
        // Even if anyone could do "rooms.put(new Room())", that new entry has to be analyzed. And if it turns out that new entry represents a program
        // this means it should be handled in a special way (replication etc). This extra functionality needs requires peers to consider this additional security
        // boundary
        return Promise.resolve(true)
      },

This line basically say, if I receive/sync/create a RoomDB in my Documents< RoomDB> db I will open it automatically if I am a replicator and replicate content inside there (recursive db replication).

My rough guess from this is that the rooms prior where created when min replicas was set to 2 instead of a large value, meaning data got lost. Or did you observe this unexpected behaviour after doing the updates for the min replicas settings for both LobbyDB and RoomDB?

Azaeres commented 9 months ago

Yeah, these rooms I've been testing in were created when min replicas was set to 2.

marcus-pousette commented 9 months ago

Also, I was considering, as an exercise, to add a human-readable datetime-stamp to each message. I might need to add a field to the Post class, but syncing them across peers is not so trivial. A datetime on one device may be incorrect on another, for several reasons. What's the best way to think about this, and how might I go about displaying a timestamp on each message for each peer?

You can actually just do a index "tranformation" that indexes documents with their timestamps so you can visualize them when you search for them with timestamps. Search for (ctrl + f) for timestamp inside https://peerbit.org/#/modules/program/document-store/

and you will get some examples how you can index timestamps based on the document commit timestamps, and then use them in various ways when you are aggregating documents.

if you want to resolve the timestamp for a particular document, you can figure out what the head commit is and use its timestamp

I see it is not documented now but you can use

const timestampOfPostX = (await posts.documents.index.getDetailed(The id of the document)).[0].results[0].context.modified // or .created for the creation time
marcus-pousette commented 9 months ago

You can also make a specific field for a timestamp. But since the commits are already timestamped it is kind of nice to use them since you also get the "modified" timestamp that will bump on every change. Even though the current API to retreive them could be simplified. I could add an alias functions in the future so you can do something like

const modifiedAt = await posts.documents.getLastModified(The id of the document)
const createdAt = await posts.documents.getCreated(The id of the document)
marcus-pousette commented 9 months ago

A datetime on one device may be incorrect on another, for several reasons

This is a hard problem. I did some exploration regarding a Network Time service that uses Peerbit RPC class to talk to a centralized service that signs documents if their timestamps are set correctly. You could in theory also have some kind of service where the posts to be put are sent away and timestamped and signed by centralized party

See https://github.com/dao-xyz/peerbit/tree/master/packages/programs/clock-service for more info. But I would not focus on this issue too much at this moment because the rabbit hole is very deep here

Azaeres commented 9 months ago

The peer count in the many-chat-room example... Is that the number of peers that opened the top-level Lobby database, or something else?

Are we able to get a count of peers who opened a given RoomDB?

marcus-pousette commented 9 months ago

There is actually a Peer counter in the Room React component but it is not visualised yet (?).

Inside the Room React component you can see

https://github.com/dao-xyz/peerbit-examples/blob/1256ffbc0d407e1a2904f3ddebc5078bbe580649/packages/many-chat-rooms/frontend/src/Room.tsx#L101

open RoomDB (the actual room)

then later

  r.events.addEventListener("join", (e) => {
      r.getReady().then((set) => setPeerCounter(set.size + 1));
  });

  r.events.addEventListener("leave", (e) => {
      r.getReady().then((set) => setPeerCounter(set.size + 1));
  });

(where r is the open room)

These events will only trigger for this specific db.

For the lobby there is also the same kind of code to trigger redraws and update peer counters on changes. Basically tracks everyone who are in the lobby

https://github.com/dao-xyz/peerbit-examples/blob/1256ffbc0d407e1a2904f3ddebc5078bbe580649/packages/many-chat-rooms/frontend/src/Lobby.tsx#L77

If you dont want to work with with the events you can also resolve the current amount of subscribers/online peers for a specific db by calling only

roomInstance.getReady() 

at a interval

though I would not recommend this approach since it will poll data. The event based approach will trigger at the immediate moment when you learn about new peers

marcus-pousette commented 9 months ago

Acutally the example code provided above is not that good. What you should be able to do is just


counter = 1 //  (me)

 r.events.addEventListener("join", (e) => {
     counter ++; // 1 peer joined
  });

  r.events.addEventListener("leave", (e) => {
     counter--; // 1 peer left
  });
Azaeres commented 9 months ago

I'm still having trouble getting the peer count to meet my expectations. For example, let's say I have 3 peers connected, and quite often it says "1", and sometimes it says "3".

To make it easier to reason about, I've isolated the peer count logic into a custom hook here: https://github.com/Azaeres/etherion-lab/blob/main/src/components/scenes/Experiment4/hooks/usePeerList.ts#L6

import { usePeer } from '@peerbit/react'
import { PublicSignKey } from '@peerbit/crypto'
import { useCallback, useEffect, useState } from 'react'
import { Program } from '@peerbit/program'

export default function usePeerList(database?: Program) {
  const { peer, loading: loadingPeer } = usePeer()
  const [peers, setPeers] = useState<Record<string, PublicSignKey>>({})

  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  const join = useCallback((event: any) => {
    const { detail } = event
    console.log('database rcvd join event  > event:', event)
    setPeers((oldValue) => {
      return {
        ...oldValue,
        [detail.hashcode()]: detail,
      }
    })
  }, [])

  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  const leave = useCallback((event: any) => {
    const { detail } = event
    console.log('database rcvd leave event  > event:', event)
    setPeers((oldValue) => {
      const copy = { ...oldValue }
      delete copy[detail.hashcode()]
      return copy
    })
  }, [])

  useEffect(() => {
    database?.events.addEventListener('join', join)
    database?.events.addEventListener('leave', leave)
    return () => {
      database?.events.removeEventListener('join', join)
      database?.events.removeEventListener('leave', leave)
    }
  }, [database?.events, join, leave])

  useEffect(() => {
    if (!loadingPeer) {
      const publicKey = peer?.identity.publicKey
      const defaultPeers = publicKey
        ? {
            [publicKey.hashcode()]: peer?.identity.publicKey,
          }
        : {}
      setPeers(defaultPeers)
    }
  }, [peer, loadingPeer])

  return {
    peerCount: getPeerCount(peers),
    peerList: peers,
  }
}

function getPeerCount(peers: Record<string, unknown>) {
  return Object.keys(peers).length
}

It looks like I'm just often not receiving "join" events. Why is that? What am I doing wrong?

To see a live demo of the problem, visit: https://lab.etherion.app/experiment4?c=2463303d-349f-4ae5-9791-615c95a4ed65

Peer counts should be the same across all peers connected to the room, but I'm seeing wildly different values.

marcus-pousette commented 9 months ago

Thinks for giving my something easy to work from. Will look into this tonight or tomorrow!

marcus-pousette commented 9 months ago

@Azaeres Hello! I am not beeing able to reproduce the issue you are having the demo works well for me.. Do you have a specific flow that with high probability yields wrong behaviour?

Related: I am currently also working on rebuilding the pubsub protocol and the "get online peers" might turn into a more "poll" based approach that will scale better for large networks than the "push" approach that is currently in place. https://github.com/dao-xyz/peerbit/issues/212

Azaeres commented 9 months ago

Let's run through a step-by-step:

  1. Visit https://lab.etherion.app/experiment4 in one tab.
  2. Visit https://lab.etherion.app/experiment4 in another tab.

For me right now, I'm seeing this:

image

So far so good.

  1. Continuing, let's click a room in one tab.

image

I'm not exactly sure what to expect here, but at first I had guessed the "peers in lobby" count in the right instance was going to drop by one. It didn't. Maybe the LobbyDB has to be explicitly closed when the view is unmounted in order to see it drop in this case? Anyway, I suppose that's not necessary.

  1. Onwards! We'll click the same room in the other tab.

image

This is where I expect the "peers in room" count to be 2 for both instances. It's 1 for all.

  1. Maybe they're not connected? Let's test that. I'll post a message with the left instance.

image

Well, no, they're definitely connected! Message successfully received by the right instance. (That's super cool, btw. Looking forward to stress testing it.)

  1. Just to see what happens, I'll have the left instance navigate to the lobby (using the blue left arrow button), then re-enter the same room.

image

"Peers in room" is still 1 for both instances. But I had expected the "test message" to still load in the left instance. I figured it'd locally pull it out of cache (like the list of rooms in the lobby view), or at least replicate it from the right instance. Is this expectation incorrect?

  1. Now let's hit the browser's refresh button on the left instance.

image

Interesting! The "peers in room" value is now the expected 2. But the right instance still shows the unexpected 1.

  1. I'll have the left instance post another test message.

image

Another successfully sent message!

  1. Let's try having the left instance leave and re-enter again (same as step 6 above).

image

Back to the unexpected 1 on both instances. And the messages were dropped again in the left instance.

  1. Refresh again (same as step 7 above).

image

So when I refresh the page, I'm getting the required "join" event... but NOT when I navigate to the room. I'd expect to get exactly one "join" event for every peer except for self, whether I refresh the page or navigate to it.

At this point, I think either my expectations aren't correct, or my usePeerList() hook needs fixing.

Azaeres commented 9 months ago

I'm wondering if the "join" events are firing when I don't (yet) have an event listener attached. Is there at present a way to get (poll) the current count of peers? Then I could use that as the initial value, and the events that roll in from there will keep the value up-to-date.

marcus-pousette commented 9 months ago

I am answering this in parts:

I'm not exactly sure what to expect here, but at first I had guessed the "peers in lobby" count in the right instance was going to drop by one. It didn't. Maybe the LobbyDB has to be explicitly closed when the view is unmounted in order to see it drop in this case? Anyway, I suppose that's not necessary.

This is true. The DB is open by the peer client which is resolved from

const  { peer } = usePeer()

means that all dbs that are loaded will live through navigations. So now I guess there is a "bug" because if you go in an out of many rooms it will look like you are inside many rooms from the db perspective, but the view is only showing one.

marcus-pousette commented 9 months ago

"Peers in room" is still 1 for both instances. But I had expected the "test message" to still load in the left instance. I figured it'd locally pull it out of cache (like the list of rooms in the lobby view), or at least replicate it from the right instance. Is this expectation incorrect?

Yes this is funky behaviour. The messages should definitely load. What I realise now is that if I open the URL to a room in a completely fresh environment (like another browser) the messages loads. There could be some kind of bug going on that if you expect to have data locally it fails to resolve. But it is kind of wierd since there is a bunch of testing around this.

marcus-pousette commented 9 months ago

Doing something like this

  r.events.addEventListener("join", (e) => {
      r.getReady().then((set) => setPeerCounter(set.size + 1));
   });

r.events.addEventListener("leave", (e) => {
           r.getReady().then((set) => setPeerCounter(set.size + 1));
  });

r.getReady().then((set) => setPeerCounter(set.size + 1));          // this line

might fix the peer counter even if we missed events

marcus-pousette commented 9 months ago

This lines of code might fix that events where missed because the event listener was registered too late

room.current = r;
const sortPosts = async () => {
    let wallTimes = new Map<string, bigint>();
    await Promise.all(
        posts.current.map(async (x) => {
            return {
                post: x,
                entry: await room.current.messages.log.log.get(
                    room.current.messages.index.index.get(x.id)
                        .context.head
                ),
            };
        })
    ).then((entries) => {
        entries.forEach(({ post, entry }) => {
            wallTimes.set(
                post.id,
                entry.meta.clock.timestamp.wallTime
            );
        });
    });
    posts.current.sort((a, b) =>
        Number(wallTimes.get(a.id) - wallTimes.get(b.id))
    );

}

r.messages.events.addEventListener("change", async (e) => {
    e.detail.added?.forEach((p) => {
        const ix = posts.current.findIndex(
            (x) => x.id === p.id
        );
        if (ix === -1) {
            posts.current.push(p);
        } else {
            posts.current[ix] = p;
        }
    });
    e.detail.removed?.forEach((p) => {
        const ix = posts.current.findIndex(
            (x) => x.id === p.id
        );
        if (ix !== -1) {
            posts.current.splice(ix, 1);
        }
    });

    // Sort by time
    sortPosts()
    forceUpdate();
});

// Handle missed events by manually retrieving all posts and setting current posts to the ones we find
posts.current = await r.messages.index.search(new SearchRequest());
sortPosts()
forceUpdate()
marcus-pousette commented 9 months ago

(inside open)

Azaeres commented 9 months ago

I remember you saying that there's no peer.close(), but that the database stays open in the background? What's the recommended way to clean up, especially if the user enters many rooms over time?

marcus-pousette commented 9 months ago

I remember you saying that there's no peer.close(), but that the database stays open in the background? What's the recommended way to clean up, especially if the user enters many rooms over time?

I think there are perhaps two ways to do this.

1.

I have been working a bit to create a React hook that allows you to open and manages dbs.

https://github.com/dao-xyz/peerbit-examples/blob/master/packages/react-utils/src/useProgram.tsx

Not perfectly done with it yet, but it would be possible to do the following inside the first useEffect you see there: return a () => programRef.current.close(). Which means whenever you would use this hook inside of a component you can make sure that the db is closed whenever you destroy/unmount the component.

in the future it could also be possible to have a "peer" counter state exported from this hook that one easily could use..

in the end it would look something like

export const Room = () => {
     const { room, peerCounter } = useProgram(new RoomDB(...), args)
      // or
      //   const { room, peerCounter } = useProgram<RoomDB>("string address" args)

      // when this component is destroyed, room will  automatically close 

      return <>{peerCounter}</>
}

Another solution would be to create a context which is kind of a handler that manages what should be open or not, depending on the usecase. For the many chat room example, you perhaps want to close room 1 minut after you left them or something, for that case you can have a manager that creates this timeout for you (and aborts the close timeout if you decide to go back into the same room again)

Azaeres commented 9 months ago

The first way is how I imagined it. That'd be nice to have.

I found a DocumentStore put method, and a delete method. What's the recommended way to update an existing document?

Azaeres commented 9 months ago

Regarding updates to a DocumentStore, this looks relevant: https://peerbit.org/#/modules/program/document-store/?id=converting-existing-documents

Is deleting the old document and inserting a modified version of the old document the recommended way to make small changes to an existing document? Are individual documents immutable?

For the design I have in mind, I'll need to put document updates on a firehose blast, so it will greatly benefit from the most efficient approach.

marcus-pousette commented 9 months ago

The first way is how I imagined it. That'd be nice to have.

I found a DocumentStore put method, and a delete method. What's the recommended way to update an existing document?

I created this issue in the mean time to track this: https://github.com/dao-xyz/peerbit-examples/issues/8

Is deleting the old document and inserting a modified version of the old document the recommended way to make small changes to an existing document? Are individual documents immutable?

This section is mainly for migrating from one kind of schema to another (adding or removing fields). If you just want to update the value of an existing field, the way you can do it atm is through putting the same document with the same id if you want to track changes for a particular thing, like the movement of a spaceship. Or with unique ids for distinct things, like blasts etc.

In more detail:

For blasts you would insert a new document with a new id for every blast. And all players would subscribe to this db to see if any new blast has occured. This should just work (!)

If you are tracking the spaceship with a "location" document { id: string, x: number, y: number }. Then for every movement, there would be a new document inserted with the same id as the last location document for a particular ship, so that whenever you want to get the latest location you can just do "documents.index.get(id)" to fetch the latest one.

When you insert a document with the same id, the changes will be linked to each other (in a log). If you want to remove all history you can do the following


import { EntryType } from '@peerbit/log' 
movement.put(newLocation ,{meta: {type: EntryType.CUT}});

(But perhaps you are just storing keyboard events and calculate the positions from that instead? Should perhaps be easier to do anti cheating this way)

this will basically cut the history log at this new insertion. And it would behave like a delete + put operation.

marcus-pousette commented 9 months ago

There is no way currently for doing partial updates on a document. But it will it is on the agenda to implement that. But I don't think you will benifit from that greatly, since the documents you will be putting should be very small in memory

marcus-pousette commented 9 months ago

The performance from using the document store will be good enough from a performance and latency perspective given that I have used the document store before for doing video streaming which is much more data intensive and requires many transactions to be broadcasted per second at a very low latency

marcus-pousette commented 5 months ago

Hello @Azaeres !

There has been a lot of work improving the Syncing and reliability of Peerbit.

The persistent solution is now using OPFS (when available) instead of IndexeDB which is much more reliable.

The networking has been improved with fault tolerance now, and more robust discovery and support for direct connections!

Also there is @peerbit/react hook now that allows you to open programs easily. See this chat example https://github.com/dao-xyz/peerbit-examples/blob/68a310998370d1b33b1cc045992d322eb65ab39f/packages/one-chat-room/frontend/src/Room.tsx#L77

I was thinking about the space shooter you were working on. (Your latest experiment https://github.com/Azaeres/etherion-lab/tree/main/src/components/scenes/Experiment5). It would be really cool to make it work out. I was thinking about a space shooter version of the "eat and get larger" type of games, where you shoot other spacecrafts, and by collecting their debris you eventually upgrade new weapon systems and get more poverful, but at a cost of you getting slower and less manoeuvrable. Together with this, adding audio/voice so you can get Star Trek like "hailing" features

I would gladly help you out with programming the networking and state management stuff, and the actual game development if I can be of any help. I am also curious to see how many players this kind of system can have before one run into some limits.