cameri / nostream

A Nostr Relay written in TypeScript
MIT License
741 stars 190 forks source link

Implement mirroring #41

Open cameri opened 1 year ago

cameri commented 1 year ago

Add a setting to specify a list of relays to mirror. Allow the direction of the mirror to be configured. Allow the days to backfill to be configured. Keep track of the most recent event for each relay. Keep track from which relay the event came from? Allow mirroring to be throttled and a max queue size to be specified per relay. Allow whitelist/blacklist of pubkey and kind for each mirrored relay.

brainspacer commented 1 year ago

My two-bob's worth about nostr mirroring/replication, fwiw:

Those items listed in the issue body above, while likely expressly satisfying a (hidden?) underlying requirement, I observe are quite narrowly focused(*).

It may be worthwhile considering a more generalised approach and one which removes the need for the relay to contain the 'mirroring/replication logic'.

I mention this to possibly address a few issues (for example, relay load balancing, relay fault-tolerance) but the real issue that drives this note is censorship threats. Relays/datasets that are easily replicated are more censorship resistant. (and I'd say the censorship threats are just warming up)

If a relay can be asked questions such as: 1a. "give me the list of commitments you have that are not in this commitment + query I am supplying you"
1b. "give me the list of events you have that are not in this commitment + query I am supplying you"

Then a custom client can do the mirroring/replication.

The commitment could be a hash table (i.e. long list of event hashes), but would better yet be an RSA Accumulator (i.e. a single number that represents a commitment of all relevant events). (The commitment could also be a merkle tree, but this is even less network-efficient than a hash table.)

The custom client can first ask relayA for a commitment based on a query. (using 1a.) We'll call this setA. Then the custom client can ask relayB for the commitment of the relative complement of that setA + query. (using 1a) Then the custom client can mirror just that relative complement. i.e. read from relayB (using 1b.), and write it to relayA. And vice-versa. Repeated for any number of relays. The net result is only the symmetric differences are mirrored between relays, which is nicely network-efficient and reduces the processing + network load on the relays. And, apart from the commitment handling (1a and 1b above), relays do not need to have any other mirroring/replication logic.

And given the "+ query" I've peppered above, this enables a more tailored ability to incentivize relays because they can be funded specifically to host certain datasets that the community/s deem valuable. i.e. entire datasets don't need to be replicated. (i.e. therefore, this is not wholesale 'mirroring' but more selective replication.) And this organically addresses issues of relay bloat, because as data becomes very stale, they simply won't be actively replicated (though, ofc, one can take a backup so nothing is ever lost.)

Sorry there's no tldr here :/

-- I have some other semi-related ideas (particularly in relation to queries) but I've banged on long enough here on this point so will raise them some other time. (*) a narrow focus may paint the protocol/design into a corner and lock out various future options.

cameri commented 1 year ago

Do you have any literature on these topics? Would love to read and learn more on RSA Accumulators and commitments. Have you checked the Nostr protocol to see if this is supported with the current NIPs?

brainspacer commented 1 year ago

As a start: I think I may have supplied this link earlier in the nostr tg chat. but I didn't indicate any timestamps. My bad. The presentation commences at 58:00 in https://www.youtube.com/watch?v=IMzLa9B1_3E&ab_channel=ScalingBitcoin A Scalable Drop-in Replacement for Merkle Trees PRESENTER(s):Benedikt Bünz, Benjamin Fisch and Dan Boneh (Stanford University)

Key additional timestamps = 1:02:00 The problem with Merkles 1:02:45 RSA Accumulator intro slides.

He presents this in the context of a set of BTC UTXOs, but in relation to nostr, the RSA accumulator would be a digest of a set of events instead (according to a query handed to the relay). The term 'commitment' just refers to the digest in this case. i.e. the relay 'commits' to holding the digested dataset.

The current nostr NIPs or the various discussions on the forums do not support this kind of thing or this kind of replication in this generalised manner.

Some links that discuss RSA accumulators -

More info coming: I've reached out to a few cryptographers to see if there is an efficient and relevant js implementation of RSA accumulators that could be plugged-in for this purpose. Will advise my findings.

Something I didn't mention in my post above is that this could solve other problems such as client networking load given a client is not forced to connect to each disparate relay that holds each specific set of events it is interested in, given there would now be a neat way to efficiently replicate discrete sets of events amongst interested relays.


As nostr proponents no doubt agree, nostr's decentralised social media protocol is super important for the world to migrate from the current censorable centralised social media paradigm. But Nostr's censorship-resistant federated datastore can similarly benefit other use cases outside of just social media. My take is there is exponential opportunity for nostr adoption in those other wide set of use cases so long as its ongoing 'improvements' are generalised and do not unnecessarily paint it into a corner of narrowly satisfying just social media use cases which may lock out other use cases. The ability to selectively efficiently replicate datasets is one example of fully supporting nostr's current social media focus while avoiding locking out future use cases. Another area that would prevent locking out other use cases is in relation to providing a more generalised query capability, including support for at least "OR" operators on tag queries. But more on this later.

brainspacer commented 1 year ago

Regarding blacklisting spammers: This approach above can be used to replicate any subsets of nostr messages, including blacklists. When a relay detects spam (via any evolving criteria) and blacklists a spammer (by IP or pubkey, say), that relay's blacklist can be selectively replicated. And because custom replication clients replicate only between relays they choose (of interest to them/which they deem to have a high reputation/they trust), this avoids the issue of 'blacklist spam'. i.e. where a malicious relay maliciously blacklists a user, that malicious blacklist will not be propagated because the network will not trust the malicious blacklisting relay.