dekimir / postgres-searchbox

Enables react-instantsearch to work directly on PostgreSQL tables
Other
14 stars 1 forks source link

Discussion: Facet filtering #2

Open data-envoy opened 1 year ago

data-envoy commented 1 year ago

Did you have thoughts about facet filtering?

In case of an already established database, facets could be on the primary search table (in one of a few formats) or in different table(s) - or both.


Primary table title descripton price ?facetsJsonb? ?facetsHstore?
Football A kids football 4 { "color" : "blue" } "color"=>"blue"

Primary table id title descripton price
012 Football A kids football 4
Facet table id title
345 Color
Facet value table id faced_id title
678 123 blue

Listing facet join table

faced_value_id listing_id
678 012

Or, something else.

Let's say I want to filter on price and color, the solution isn't as obvious as with a 'noSQL' database where the facets are always flat and well defined at the database level. For example Typesense has a collection schema.

dekimir commented 1 year ago

I haven't thought about it much yet, other than recognizing that we'll obviously have to support facets. Perhaps we can ask the user to create a Postgres view with the tsvector and all facet values. Then we can simply query that view.

data-envoy commented 1 year ago

A view does make sense. That was one of my thoughts. Either that or let them write some part of the search query and supply it as a config param.

I wonder how result count would work, because I think counts can be slow in Postgres. Could we ask the user to provide a table for caching values?

dekimir commented 1 year ago

I wonder how result count would work, because I think counts can be slow in Postgres. Could we ask the user to provide a table for caching values?

I tend to put correctness before performance. Let's implement the feature first, then we can measure the queries and see if/how they can be improved.

data-envoy commented 1 year ago

Sure thing. I suppose I have my use case in mind which is 10M+ rows. Doing facet counts is in the probably in the seconds range.

Anyway, we'll se how it goes.

data-envoy commented 1 year ago

Facets reseach

Trying to reverse this demo https://codesandbox.io/s/github/algolia/instantsearch/tree/master/examples/react-hooks/default-theme

It's quite tough. Heres my first bit of research.

Initial request

{
  requests: [
    {
      indexName: 'instant_search',
      params:
        'facets=%5B%22*%22%5D&highlightPostTag=__%2Fais-highlight__&highlightPreTag=__ais-highlight__&hitsPerPage=20&maxValuesPerFacet=20&page=0&query=&tagFilters=',
    },
  ],
}

Initial response

{
  results: [{
    query: '',
    facets: {
      // ... see below
    },
    facets_stats: {
      // ... see below
    },
    exhaustiveFacetsCount: boolean,
    exhaustive: {
      facetsCount: boolean,
      // ... (NA)
    }
    params: 'facets=%5B%22*%22%5D& ... &query=&tagFilters=',
    // decodes to ["*"]
    facets_stats: {
      price: {
        min: 1,
        max: 4999.99,
        avg: 242.806,
        sum: 5212810
      },
      rating: {
        // same aggregate values
      }
    },
    renderingContent: {
      // ... see below
    },
  }]
}

facets

facets object ```js { facets: { brand: { 'Insignia™': 746, Samsung: 633, Metra: 591, HP: 530, // ... }, price: { '39.99': 1163, '49.99': 1083, '19.99': 1023, '29.99': 896, }, rating: { '0': 3422, '1': 156, '2': 194, '3': 1622, '4': 13924, '5': 2150, }, categories: { Appliances: 4306, 'Computers & Tablets': 3563, 'Cell Phones': 3291, 'Cell Phone Accessories': 2836, }, price_range: { '1 - 50': 9927, '50 - 100': 3471, '100 - 200': 2640, '200 - 500': 2628, '500 - 2000': 2467, '> 2000': 336, }, free_shipping: { true: 18013, false: 3456, }, 'hierarchicalCategories.lvl0': { Appliances: 4306, 'Computers & Tablets': 3563, // ... }, 'hierarchicalCategories.lvl1': { 'Cell Phones > Cell Phone Accessories': 2836, 'Appliances > Small Kitchen Appliances': 1510, }, 'hierarchicalCategories.lvl2': { 'Cell Phones > Cell Phone Accessories > Cell Phone Cases & Clips': 1195, 'Cell Phones > Cell Phone Accessories > iPhone Accessories': 607, }, 'hierarchicalCategories.lvl3': { 'Cell Phones > Cell Phone Accessories > iPhone Accessories > iPhone Cases & Clips': 517, 'Appliances > Small Kitchen Appliances > Coffee, Tea & Espresso > Coffee Makers': 133, }, 'hierarchicalCategories.lvl4': { 'Cell Phones > Cell Phone Accessories > Smartwatches & Accessories > Smartwatch Accessories > Smartwatch Bands': 86, 'Appliances > Small Kitchen Appliances > Coffee, Tea & Espresso > Coffee Pods & Beans > Coffee Pods': 82, }, }, } ```

facets_stats

facets_stats property ```js { facets_stats: { price: { min: 20.98, max: 50, avg: 37.1393, sum: 62951.1, }, rating: { min: 0, max: 5, avg: 3, sum: 5666, }, popularity: { min: 3, max: 21449, avg: 4888, sum: 8286386, }, }, } ```

renderingContent

renderingContent property ```js { renderingContent: { facetOrdering: { facets: { order: [ 'brand', 'hierarchicalCategories.lvl0', 'categories', 'price', 'free_shipping', 'rating', 'type', ], }, values: { categories: { sortRemainingBy: 'count', }, 'hierarchicalCategories.lvl0': { sortRemainingBy: 'alpha', }, price: { sortRemainingBy: 'count', }, // .. count or alpha }, }, } } ```

Findings

data-envoy commented 1 year ago

Some questions still stand from my previous points, but after some searching I found:

Now I know how to parse facetFilters=[["brand:Samsung","brand:Apple"],["categories:Cell Phones"]] to SQL.

Still not really sure why the multiple request and result objects.

data-envoy commented 1 year ago

I discovered reason for multiple request and result objects.

It's to show the right facet values and counts when a filter is active.

Imagine that someone clicks to filter by brand:samsung. Request 0 is used for the search results with that filter active.

It iw as just request 0, then the brand filter on the left hand side would have zero info about the other brands, values or counts.

So that is why there is request 1, without the filter of brand:samsung. This returns the other brand values and counts.

data-envoy commented 1 year ago

Perhaps we can ask the user to create a Postgres view

I've been thinking on this alot recently. As I need to build something that will scale to 10 - 500M rows.

I think a view would not be the most effcient solution. As maybe the view would need to be updated on each facet addition, and I can't think of how we would index many to many well. With a json column I think we lose the benefits of Postgres being SQL.

Are you happy for me to present a solution that would work for the multi-table structure above https://github.com/dekimir/postgres-searchbox/issues/2#issue-1665926146 ? And be extendable to custom table structures?

dekimir commented 1 year ago

Are you happy for me to present a solution that would work for the multi-table structure above #2 (comment) ? And be extendable to custom table structures?

I'm happy with any contribution, if it works for you. We can always improve later, depending on needs.