ONEARMY / community-platform

A platform to build useful communities that aim to tackle global problems
https://platform.onearmy.earth
MIT License
1.08k stars 358 forks source link

[feature request] Refactor search to remove dependency on all documents being loaded with fuse.js #3307

Open thisislawatts opened 4 months ago

thisislawatts commented 4 months ago

Is your feature request related to a problem? Please describe. The current search functionality for How-tos, Research and Questions relies on loading all documents using fuse.js, which can cause performance issues and inefficiencies, particularly with large datasets. This dependence on loading all documents poses a scalability challenge and can hinder the overall user experience.

Describe the solution you'd like I propose refactoring the search functionality to eliminate the dependency on loading all documents with fuse.js. Instead, we should implement a more efficient and scalable search algorithm that allows for dynamic loading of documents based on user queries. This approach would improve performance and user experience, especially for larger datasets.

Describe alternatives you've considered One alternative is to optimize the existing fuse.js implementation by fine-tuning its configuration and indexing process. However, this may only provide marginal improvements and still not address the fundamental issue of scalability. Another alternative is to explore different search libraries or algorithms that offer better support for dynamic loading and scalability.

Additional context It's important to prioritize this feature request as it directly impacts the performance and usability of the search functionality within our application. By refactoring the search to remove the dependency on loading all documents with fuse.js, we can enhance the overall efficiency and scalability of our system, leading to a better user experience. For example this would unblock using SSR.

@mariojsnunes has previously done some work around introducing this for Questions. Ideally we want a single solution that can be used for How-tos, Research and Questions.

Consider that Firebase charges for each read or query operation. So after paying the cost of fetching all documents for a collection, all user searches are "free" because the data has already been fetched from the origin. https://firebase.google.com/docs/firestore/pricing

mariojsnunes commented 4 months ago

Thanks for raising this!

Currently the code for How-tos and Research is tied up with FilterSorterDecorator. But each module has specific needs which leads to some hardcoded conditions.

hardcoded properties: image image

sorting options that not all modules need: image

hardcoded filters: image

I believe the solution will be something like this where each module has it's own search method, optimized for it's needs. Moving to this approach, some components could be more generic. So instead of that CategoriesSelect with hardcoded conditions we would have a much simpler component CategoriesSelectV2

mariojsnunes commented 4 months ago

As of now, I haven't found a blocker for using the firestore api for our search. It's not ideal, but I haven't found a better solution that wouldn't either cost more or take significant more effort.

thisislawatts commented 4 months ago

@mariojsnunes I remember you mentioned indexes would be required for the search, if we are going to introduce this dependency we should really look at automating index management rather than relying on click ops to configure them.

It seems like this is something that can be handled as part of firebase deploy cmd we use in our deployment job https://firebase.google.com/docs/firestore/query-data/indexing#use_the_firebase_cli

thisislawatts commented 4 months ago

Refactoring the Category and FilterSorterDecorator would be great, probably something that be could be done in a smaller prep PR separate to the search work here. There is definitely too much coupling going on in there atm.

mariojsnunes commented 4 months ago

could be done in a smaller prep PR separate to the search work here.

But doing it on a separate PR would mean to modify the code of all modules? I think it's better to focus on the questions module first.