ArangoDB-Community / ArangoBnB

16 stars 3 forks source link

AQL: Filter Search #14

Closed cw00dw0rd closed 3 years ago

cw00dw0rd commented 3 years ago

The AQL query that support #13 to allow for keywords to restrict the results shown. This may be a search over the description of the rental or whatever makes the most sense for the dataset.

Simran-B commented 3 years ago

Group and count amenities (this is pretty fast, ~500ms):

RETURN MERGE(
  FOR doc IN arangobnb
    FOR a IN doc.amenities
      COLLECT amenity = a WITH COUNT INTO count
      RETURN { [amenity]: count }
)

It's apparently free-text, not sure how suitable it is for auto-complete. We could still display a list of everything that has a count of over 20 or so and provide auto-complete for that. To implement this server side, we would need another View and collection (update periodically with a Foxx job?).

cw00dw0rd commented 3 years ago

Why the merge? This seems to provide the expected results and is ~150ms:

FOR doc in arangobnb
FOR amenity in doc.amenities
 COLLECT item = amenity with COUNT into c
 SORT c DESC
 LIMIT 20
 RETURN item

This could have a LIMIT and then we could have a separate query for text search. WDYT?

Simran-B commented 3 years ago

MERGE() is merely used to create a mapping, amenity to count, instead of returning an object per amenity with two keys. It's just for displaying the result in a more compact way here.

150ms is still not particularly fast for production purposes if we consider that this is Berlin only and that query caching would potentially not help in a real world application where the data changes quite often. It's fine for demonstration purposes, but this seems like an important point if we want to show scalability.

cw00dw0rd commented 3 years ago

We can get this down to around 1ms if we combine it with the mapResults query but that means we would need to adjust the markers being displayed to keep them consistent with the filters, this shouldn't be a problem but will likely result in fewer markers on the map.

I would need to test the performance of loading the increased number of markers with each map drag, instead of only adding new ones.

Increasing the LIMIT on the listings returned does not add much to the query time, it would just be a matter of response time.

LET listings = (
FOR listing IN arangobnb
    SEARCH ANALYZER(GEO_CONTAINS(GEO_POLYGON(@poly), listing.location), "geo")
    LIMIT 20
    RETURN listing
)

Let filters = (
FOR doc in listings
    FOR amenity in doc.amenities
        COLLECT item = amenity with COUNT into c
        SORT c DESC
        LIMIT 20
        RETURN item
)

RETURN {listings,filters}
cw00dw0rd commented 3 years ago

Returning the filters with the results, increasing the number of listings returned, and displaying more markers on-screen has a negligible performance impact. Here is what I have done so far:

This results in a fast query that is able to return the filters found in the results as well as filter the results based on user-selected criteria. This also takes advantage of more ArangoSearch optimization with primarySort.

However, the new issue that has arisen is the need to refactor the map markers to handle the higher volume and update when filters are selected. Here is the query with some preset values.

LET listings = (
FOR listing IN arangobnb
    SEARCH ANALYZER(GEO_CONTAINS(GEO_POLYGON(@poly), listing.location), "geo")
    AND 
    ANALYZER(["Private room", "Entire home/apt"] ANY IN listing.room_type, "identity")
    AND
    ANALYZER(["Wifi", "Heating"] ALL IN listing.amenities, "identity") 
    AND 
    IN_RANGE(listing.price, 30, 50, true, true)

    SORT listing.number_of_reviews DESC, listing.review_scores_rating DESC
    LIMIT 100
    RETURN listing
)

Let filters = (
FOR doc in listings
    FOR amenity in doc.amenities
        COLLECT item = amenity with COUNT into c
        SORT c DESC
        LIMIT 100
        RETURN item
)

RETURN {listings, filters}