Flagsmith / flagsmith

Open Source Feature Flagging and Remote Config Service. Host on-prem or use our hosted version at https://flagsmith.com/
https://flagsmith.com/
BSD 3-Clause "New" or "Revised" License
4.84k stars 369 forks source link

Improve search and display capabilities for identities in the Flagsmith UI #4016

Closed matthewelwell closed 1 month ago

matthewelwell commented 5 months ago

Currently, due to the large quantities of data involved in identity storage, and the way in which that data is stored in our SaaS platform to support the Edge API, searching and displaying additional data about identities can be very difficult.

Some of the main problems are:

  1. It is not possible to search on another other than the identifier. This is problematic, particularly when introducing non-engineering users to Flagsmith since the identifier is often a unique key such as a uuid or similar which most users will not have access to.
  2. Similar to the above, it is not possible to see at a quick glance from the list of identities which identities are which because we only show the identifier.
  3. We do not show the total number of identities (only applicable to SaaS).

Note that this issue combines both #444 and #290.

matthewelwell commented 5 months ago

The key issue described above is (1). There are a few options that we can investigate here for a solution:

1. Add an alias function to our SDKs which will add a new, indexed, parameter to our identities which can then be searched across.

We would implement something like:

flagsmith.alias(identifier="<uuid>", alias="matthew.elwell")

This could get stored against the identity and displayed alongside the identifier in the list, and the search could search across both the identifier and the alias.

Pros:

Cons:

Note that as a temporary measure here, we could allow users to add an alias via the admin API, which would mean that customers could either do this from the dashboard, so that once an identity has been found once via their identifier, they could be aliased and found easier next time. Or, they could iterate over their identities via the management API and alias the identities.

2. Create a search index (where?) based on the traits for each identity

We could create a search index that looks something like:

trait_key_1:trait_value_1;trait_key_2;trait_value_2...

Then in the search field (or a separate search input), we could add the option to choose a trait to search by and then build the search query to do a full text search across this field building the query to look something like trait_key:trait_value to avoid hitting multiple trait keys that have similar values for example.

Note that we may want to have people define the traits that they want to be able to search on, rather than building the search index for all traits for all identities which might get unmanageably large.

Pros:

Cons:

matthewelwell commented 5 months ago

I think for SaaS (more specifically the Edge API), we'd want to look into using DynamoDB streams to trigger a lambda which will update a new model in Django which we can use to search across to get the results, before hitting dynamodb.

This will be a significant undertaking, however, probably a few weeks of work and testing, plus we would also need to work out how to migrate the data into the postgres models in the first place.

For self hosted, we could probably add this functionality quite easily by just directly searching across the traits as the data for a self hosted install would not be as large as for our SaaS environment.

novakzaballa commented 4 months ago

I think for SaaS (more specifically the Edge API), we'd want to look into using DynamoDB streams to trigger a lambda which will update a new model in Django which we can use to search across to get the results, before hitting dynamodb.

I love this idea for self-hosted, I can remember it was also suggested by @dabeeeenster to handle identity overrides in local evaluation. For SaaS, I recommend using a cheaper and more efficient solution for large data sets. This type of use case is ideal for a Data-Lake/Data-Warehouse solution. As I suggested several times we could:

That will allow the customers to make queries like:

Another advantage is that in the future, we could offer data analysis ourselves if we want.

This would allow us to store all/any historical and non-operational data here and access it by any criteria, we could create materialized views for the most used access patterns, so we can allow our customers to access/analyze their information in any way. but that is out of the scope of this particular issue.

matthewelwell commented 2 months ago

I've begun investigating this a little further. I have made a start on some PoC code for option 2 in my comment above (here). See the WIP PR here.

Some important notes:

  1. When using dynamo streams and global replication, it is sufficient to connect to a stream in a single region, all replicated writes will also trigger the stream.
  2. I've done some basic maths on the AWS pricing (although don't quote me on it) and it doesn't look like it will be expensive. See additional calculations / notes here.

Questions to answer:

  1. What service will actually consume the DDB stream? Probably Lambda? But then how do we eventually get the data into postgres? RDS proxy? An endpoint in the core API to queue a task?
matthewelwell commented 1 month ago

@kyle-ssg In this PR #4569 I have added a new field to the edge identities called "dashboard_alias".

From a FE perspective we need to 1:

  1. Display it on the detail view of an identity
  2. Allow an option to update it via the detail view of an identity
  3. Add functionality to search by dashboard_alias (by simply searching for dashboard_alias:<alias>
  4. Maybe tidy up my bad implementation of the dashboard alias in the list view?