Flockingbird / roost

Proof of Concept for Eventsourced backend
https://flockingbird.social
MIT License
35 stars 4 forks source link

RFC Discover People - searching in my network. #20

Open berkes opened 3 years ago

berkes commented 3 years ago

Summary

A search feature where I can discover members of and in my professional social network

Basic example

As a member of an instance When I am on the search page And I search for "Carpenter" Then I get a list of all profiles with "carpenter" in an attribute that is visible to me And in that list are my contacts, anyone on my instance and anyone who is contact of one/more of my contacts So that I can search for people in my network.

afbeelding

See the figma scetches for more details

Motivation

Why are we doing this? What use cases does it support? What is the expected outcome?

Detailed design

Please read the blog-post explaining the feature[1].

TODO: should we copy-paste and/or rephrase that blog-post in here?

Drawbacks

Searching "contacts" requires a search index to contain any contact, regardless of the instance they use. Searching "to me visible attributes" only, means an index per member: my indexed attributes are different from yours, even if the index contains the exact same people: the visibility and therefore searchability of attributes may differ between us. Searching "contacts of contacts" requires this index to maintain not just my contacts but their contacts too. Pushing changes to attributes and profiles over AP into indexes requires your instance to push any changes you make to your profile towards mine. This is to be expected. However, when you have Anne as contact, and Anne changes an attribute, Annes instance will push those changes to your instance, as expected. Unexpected to Anne, however, is that your instance will now push those changes to my index too. But since this applies only to public attributes the result is not unexpected. Pushing changes to profiles over AP causes load on the receiving server. A larger network graph makes that effect larger.

Limiting the amount of indexed profiles helps against large and fast changing indexes. Limiting the amount of attributes per profile also helps here. But both can cause unexpected results where people expect to see a certain (long tail) result, but don't, as the attribute or profile is pruned from the index.

Removing attributes or setting attributes from public to private will be pushed across the network, but instances may choose to ignore this, keep the old settings or may fail to update them due to technical issues. We cannot guarantee that information that was once public, is removed from indexes with Activity Pub.

Alternatives

  1. Designating central servers that index the whole known flockingbird Fediverse. Downside 1: it cannot search or rank based on your personal network graph since it does not know about "you". Downside 2: it can only index public data, and not "data visible to you" that may be private to others. Downside 3: it centralises the portal to discovery and thus becomes a (accidental) tool for censoring. Upside: the model is easier. Upside: existing (OSS) tech, or SAAS can be used.
  2. Indexing only public data. Downside 1: being able to search in "data visible to you" gives the expected results. e.g. when I see that john has "aspiration: carpenter", I search for "carpenter" but john does not show up because he has set this tag to be only visible to me, is unexpected. Upside: no need for a per-member index, simpler indexing model.
  3. Only indexing only my instance. Downside: ability to explore the network and discover new people is very limited. Upside: simple model, one index per instance; no need to process changes from remote.

Adoption strategy

Basic feature. Should be in initial release. Could be divided into phases to keep the more complex (private fields, contacts-of-contacts) for later moment.

How we teach this

Search is a main feature, behind a primary menu item on the primary menu. Difference between "my contacts", "contacts of my contacts" and "people on my instance" must be made very clear in the UX. Attributes that cause a match and the item to be returned, must be highlighted so it is clear what causes the hit to be returned. It must be clear that search is only for profiles and not "updates", or other items. It must be clear that returned results are based on "last known data", and that data may have changed but is not (yet) updated in the index and results.

Unresolved questions

Tech to use for indexing. Client-side: fuse.js, etc. Server-side: in PG, Elasticsearch, MeiliSearch etc. Should this be a separate service and feature, to be enabled by instance admins?


Footnotes and references

[1] https://fediverse.blog/~/Flockingbird/finding-people-with-flockingbird


This RFC template is modified from the React RFC template

zleap commented 3 years ago

So if this needs to use any sort of Analytics, I found this https://matomo.org/ which describes it's self as "Google Analytics alternative that protects your data and your customers' privacy"

I am not sure if this is free software though. So it could be more of an add on option.