Open data-envoy opened 1 year ago
I haven't thought about it much yet, other than recognizing that we'll obviously have to support facets. Perhaps we can ask the user to create a Postgres view with the tsvector
and all facet values. Then we can simply query that view.
A view does make sense. That was one of my thoughts. Either that or let them write some part of the search query and supply it as a config param.
I wonder how result count would work, because I think counts can be slow in Postgres. Could we ask the user to provide a table for caching values?
I wonder how result count would work, because I think counts can be slow in Postgres. Could we ask the user to provide a table for caching values?
I tend to put correctness before performance. Let's implement the feature first, then we can measure the queries and see if/how they can be improved.
Sure thing. I suppose I have my use case in mind which is 10M+ rows. Doing facet counts is in the probably in the seconds range.
Anyway, we'll se how it goes.
Trying to reverse this demo https://codesandbox.io/s/github/algolia/instantsearch/tree/master/examples/react-hooks/default-theme
It's quite tough. Heres my first bit of research.
{
requests: [
{
indexName: 'instant_search',
params:
'facets=%5B%22*%22%5D&highlightPostTag=__%2Fais-highlight__&highlightPreTag=__ais-highlight__&hitsPerPage=20&maxValuesPerFacet=20&page=0&query=&tagFilters=',
},
],
}
{
results: [{
query: '',
facets: {
// ... see below
},
facets_stats: {
// ... see below
},
exhaustiveFacetsCount: boolean,
exhaustive: {
facetsCount: boolean,
// ... (NA)
}
params: 'facets=%5B%22*%22%5D& ... &query=&tagFilters=',
// decodes to ["*"]
facets_stats: {
price: {
min: 1,
max: 4999.99,
avg: 242.806,
sum: 5212810
},
rating: {
// same aggregate values
}
},
renderingContent: {
// ... see below
},
}]
}
On a query of empty string: '' a query is made to the database. The response includes facet params. This is how the filters are populated on page load.
These same params are updated for every search.
When no filters are selected the response is an object of length 1 like above.
When a facet is used to filter, the requests are updated to
{
requests: [
{
indexName: 'instant_search',
params:
'facetFilters=%5B%5B%22brand%3ASamsung%22%5D%5D&facets=%5B%22*%22%5D&hitsPerPage=20&maxValuesPerFacet=20&page=0&query=&tagFilters=',
},
{
indexName: 'instant_search',
params:
'analytics=false&clickAnalytics=false&facets=brand&hitsPerPage=0&maxValuesPerFacet=20&page=0&query=',
},
],
}
In the response a new object is added to the results array like:
{
results: [
{
query: '',
nbHits: 633 // actual number of results
facets: {
// ... see below
},
params: facetFilters=[["brand:Samsung"]]&facets=["*"]
},
{
query: '',
nbHits: 21469 // high number
facets: {
// ... see below
},
// no facet_stats
params: facets=brand...
},
];
}
Add another brand and filter by category. Requests are below. Notice:
{
requests: [
{
indexName: 'instant_search',
params:
'facetFilters=[["brand:Samsung","brand:Apple"],["categories:Cell Phones"]]&facets=["*"]...&hitsPerPage=20&&page=0&query=&ruleContexts=["ais-brand-Apple"]&tagFilters=',
},
{
indexName: 'instant_search',
params:
'analytics=false&clickAnalytics=false&facetFilters=[["categories:Cell Phones"]]&facets=brand...&hitsPerPage=0&&page=0&query=&ruleContexts=["ais-brand-Apple"]',
},
{
indexName: 'instant_search',
params:
'analytics=false&clickAnalytics=false&facetFilters=[["brand:Samsung","brand:Apple"]]&facets=["categories"]...&hitsPerPage=0&&page=0&query=&ruleContexts=["ais-brand-Apple"]',
},
],
}
Response
{
results: [
{
query: '',
nbHits: 331 // actual number of results
facets: {
// ... see below
},
params: // same as equest index 0
},
{
query: '',
nbHits: 3291 // high number
facets: {
// ... see below
},
// no facet_stats
params: // same as request index 1
},
{
query: '',
nbHits: 1075 // high number
facets: {
// ... see below
},
// no facet_stats
params: // same as request index 2
},
];
}
Some questions still stand from my previous points, but after some searching I found:
Now I know how to parse facetFilters=[["brand:Samsung","brand:Apple"],["categories:Cell Phones"]]
to SQL.
Still not really sure why the multiple request and result objects.
I discovered reason for multiple request and result objects.
It's to show the right facet values and counts when a filter is active.
Imagine that someone clicks to filter by brand:samsung. Request 0 is used for the search results with that filter active.
It iw as just request 0, then the brand filter on the left hand side would have zero info about the other brands, values or counts.
So that is why there is request 1, without the filter of brand:samsung. This returns the other brand values and counts.
Perhaps we can ask the user to create a Postgres view
I've been thinking on this alot recently. As I need to build something that will scale to 10 - 500M rows.
I think a view would not be the most effcient solution. As maybe the view would need to be updated on each facet addition, and I can't think of how we would index many to many well. With a json column I think we lose the benefits of Postgres being SQL.
Are you happy for me to present a solution that would work for the multi-table structure above https://github.com/dekimir/postgres-searchbox/issues/2#issue-1665926146 ? And be extendable to custom table structures?
Are you happy for me to present a solution that would work for the multi-table structure above #2 (comment) ? And be extendable to custom table structures?
I'm happy with any contribution, if it works for you. We can always improve later, depending on needs.
Did you have thoughts about facet filtering?
In case of an already established database, facets could be on the primary search table (in one of a few formats) or in different table(s) - or both.
Listing facet join table
Or, something else.
Let's say I want to filter on price and color, the solution isn't as obvious as with a 'noSQL' database where the facets are always flat and well defined at the database level. For example Typesense has a collection schema.