Closed matthewelwell closed 1 month ago
The key issue described above is (1). There are a few options that we can investigate here for a solution:
1. Add an alias
function to our SDKs which will add a new, indexed, parameter to our identities which can then be searched across.
We would implement something like:
flagsmith.alias(identifier="<uuid>", alias="matthew.elwell")
This could get stored against the identity and displayed alongside the identifier in the list, and the search could search across both the identifier and the alias.
Pros:
Cons:
Note that as a temporary measure here, we could allow users to add an alias via the admin API, which would mean that customers could either do this from the dashboard, so that once an identity has been found once via their identifier, they could be aliased and found easier next time. Or, they could iterate over their identities via the management API and alias the identities.
2. Create a search index (where?) based on the traits for each identity
We could create a search index that looks something like:
trait_key_1:trait_value_1;trait_key_2;trait_value_2...
Then in the search field (or a separate search input), we could add the option to choose a trait to search by and then build the search query to do a full text search across this field building the query to look something like trait_key:trait_value
to avoid hitting multiple trait keys that have similar values for example.
Note that we may want to have people define the traits that they want to be able to search on, rather than building the search index for all traits for all identities which might get unmanageably large.
Pros:
Cons:
I think for SaaS (more specifically the Edge API), we'd want to look into using DynamoDB streams to trigger a lambda which will update a new model in Django which we can use to search across to get the results, before hitting dynamodb.
This will be a significant undertaking, however, probably a few weeks of work and testing, plus we would also need to work out how to migrate the data into the postgres models in the first place.
For self hosted, we could probably add this functionality quite easily by just directly searching across the traits as the data for a self hosted install would not be as large as for our SaaS environment.
I think for SaaS (more specifically the Edge API), we'd want to look into using DynamoDB streams to trigger a lambda which will update a new model in Django which we can use to search across to get the results, before hitting dynamodb.
I love this idea for self-hosted, I can remember it was also suggested by @dabeeeenster to handle identity overrides in local evaluation. For SaaS, I recommend using a cheaper and more efficient solution for large data sets. This type of use case is ideal for a Data-Lake/Data-Warehouse solution. As I suggested several times we could:
parquet
files which are plain text files (usually zippeed) organized in a columnar way to optimize access to large datasets.That will allow the customers to make queries like:
Another advantage is that in the future, we could offer data analysis ourselves if we want.
This would allow us to store all/any historical and non-operational data here and access it by any criteria, we could create materialized views for the most used access patterns, so we can allow our customers to access/analyze their information in any way. but that is out of the scope of this particular issue.
I've begun investigating this a little further. I have made a start on some PoC code for option 2 in my comment above (here). See the WIP PR here.
Some important notes:
Questions to answer:
@kyle-ssg In this PR #4569 I have added a new field to the edge identities called "dashboard_alias"
.
From a FE perspective we need to 1:
dashboard_alias:<alias>
Currently, due to the large quantities of data involved in identity storage, and the way in which that data is stored in our SaaS platform to support the Edge API, searching and displaying additional data about identities can be very difficult.
Some of the main problems are:
uuid
or similar which most users will not have access to.Note that this issue combines both #444 and #290.